Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasons for not using saved objects for storing kibana data #80912

Open
kobelb opened this issue Oct 16, 2020 · 13 comments
Open

Reasons for not using saved objects for storing kibana data #80912

kobelb opened this issue Oct 16, 2020 · 13 comments
Labels
discuss Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@kobelb
Copy link
Contributor

kobelb commented Oct 16, 2020

🚩 Note I intend to use the main issue description to reflect our growing understanding of the situation, so I will be periodically updating the main issue description to reflect what we discuss. I'll make sure to add a comment denoting that this has occurred, so it's not silently changing.

A majority of Kibana's entities are persisted in saved-objects. However, there's a growing number of non-saved-object Elasticsearch indices that are being used to store Kibana specific entities. The following are the ones that I'm currently aware of:

  1. Alerting's event log - .kibana-event-log-*
  2. APM agent configuration - .apm-agent-configuration
  3. APM custom link - .apm-custom-link
  4. Detection engine signals - .siem-signals-*
  5. Security solution lists - .lists and .values
  6. Reporting - .reporting-*

I've started this discuss issue to determine what other Elasticsearch indices are being used to store Kibana specific entities, and enumerate the reasons for why they aren't being stored as saved-objects. Saved-objects provide a number of features including migrations, authorization, audit logging, export/import, space awareness, and encrypted attributes that developers forgo when using non-saved-object ES indices.

I'd like to perform this exercise to ensure that there aren't limitations that should be addressed with saved-objects to make them applicable to other use-cases or figure out which current saved-object specific features should be made available when using non-saved-object ES indices.

Reasons we haven't used saved-objects

End-users should be able to query the indices directly

Saved-objects are stored in a "system index", and as such, end-users will not be able to query these indices directly starting in 8.0. Even if end-users could theoretically query system-indices, we treat the ES document format as an implementation detail of saved-objects, and they're prone to change during minor versions in a non-backward compatible manner, so end-users shouldn't be querying them directly.

Applies to: Alerting's event log, Detection engine signals

There are too many saved-objects

The SIEM team has outlined a few of the issues that they experienced when trying to model their lists using saved-objects in #64715. Notably, SavedObjectsClient#find's paging implementation doesn't function properly when there are more than 10k results, which is being tracked by #77961.

Applies to: Security solution lists

Documents are too large

Reporting is using its own dedicated .reporting-* indices because they include base64 encoded data for the generated CSVs, PDFs and PNGs. Since these documents are generally so large, they can't be migrated using saved-object migrations, and they're created on a weekly basis.

Applies to: Reporting

Aggregations

Plugins wanting to run aggregations cannot use the saved objects client (we have made good progress in #64002 but it might take some time for plugins to adopt it).

In addition, it will not be possible to use a query to limit the documents to aggregate over. One workaround is to use a KQL filter, but this impacts performance and is discouraged by the ES team #69172

Applies to: APM Agent Configuration

Filtering on update / delete queries

It's not possible to efficiently delete or update many documents without doing these operations over all documents of a certain saved object type

Filtering on nested fields

Filter validation fails when writing a KQL query for nested field types #81009

@kobelb kobelb changed the title [DRAFT] Kibana's non-saved-object ES indices Kibana's non-saved-object ES indices Oct 16, 2020
@kobelb kobelb added the discuss label Oct 16, 2020
@kobelb
Copy link
Contributor Author

kobelb commented Oct 16, 2020

@sqren Do any of the existing reasons for not using saved-objects apply to the APM specific entities, or would we model these using saved-objects given hind-sight and an appropriate method of transitioning to saved-objects?

/cc @elastic/kibana-platform @spong @XavierM @mikecote

@sorenlouv
Copy link
Member

sorenlouv commented Oct 16, 2020

@kobelb Thanks for starting this discussion. It's been a while since we decided to go with a dedicated system index over saved objects so I might be forgetting some details, and SO might have changed. Overall I think it boiled down to limitations in querying abilities. For agent configuration we need to filter documents using boolean logic and operators like constant_score and boost. At the time custom queries for retrieving saved objects were not supported (or perhaps recommended against?).

This is an example of the query we make to retrieve an agent configuration:

const serviceNameFilter = service.name
? [
{
constant_score: {
filter: { term: { [SERVICE_NAME]: service.name } },
boost: 2,
},
},
]
: [];
const environmentFilter = service.environment
? [
{
constant_score: {
filter: { term: { [SERVICE_ENVIRONMENT]: service.environment } },
boost: 1,
},
},
]
: [];
const params = {
index: indices.apmAgentConfigurationIndex,
body: {
query: {
bool: {
minimum_should_match: 2,
should: [
...serviceNameFilter,
...environmentFilter,
{ bool: { must_not: [{ exists: { field: SERVICE_NAME } }] } },
{
bool: { must_not: [{ exists: { field: SERVICE_ENVIRONMENT } }] },
},
],
},
},
},
};

Is this something that's possible today?

@kobelb
Copy link
Contributor Author

kobelb commented Oct 19, 2020

Is this something that's possible today?

SavedObjectsClient#find supports KQL expressions now; however, as far as I'm aware, KQL does not support constant_score queries. @lukasolson, can you confirm this?

@kobelb
Copy link
Contributor Author

kobelb commented Oct 20, 2020

For those following along, I recently added Reporting to the above description. They're using their own system-indices because the report output is stored in base64 encoded fields, which creates large documents.

@kobelb
Copy link
Contributor Author

kobelb commented Oct 20, 2020

@sqren, I heard through the grape-vine that APM recently implemented annotations. Based on the docs, these are stored in the observability-annotations index. Were there specific requirements that led to us not modeling these as saved-objects?

@legrego
Copy link
Member

legrego commented Oct 21, 2020

As of 7.10, Kibana stores session information in the ${kibana.index}_security_session* set of indices (docs).

The data is meant to be ephemeral, as Kibana will periodically cleanup sessions that are no longer valid.

These indices are not meant to be consumed by end-users directly, and the more interesting contents are encrypted anyway.

@FrankHassanabad
Copy link
Contributor

A key reason we opted to use a data index instead of a SO for signals/alerting support here:

.siem-signals-*

Was that users want to be able to create dashboards and use discover to query against their alerting data which you cannot do with saved objects at this time. Would be nice to have dashboard/first class query support for saved objects like we have for data indexes.

@sorenlouv
Copy link
Member

@sqren, I heard through the grape-vine that APM recently implemented annotations. Based on the docs, these are stored in the observability-annotations index. Were there specific requirements that led to us not modeling these as saved-objects?

We wanted to treat annotations like just another index that users can query in Discover, visualize etc. We also wanted to stay ECS compatible (again, to make querying easier).
With that in mind, would SO still have been the recommended approach?

@kobelb
Copy link
Contributor Author

kobelb commented Oct 22, 2020

@FrankHassanabad and @sqren, if we want end-users to be able to query these indices directly, I wouldn't recommend storing them as saved-objects at this time. However, as I've mentioned elsewhere, I'd recommend storing them as . prefixed hidden-indices, which is what we're doing with .siem-signals-*.

@lukasolson
Copy link
Member

as far as I'm aware, KQL does not support constant_score queries. @lukasolson, can you confirm this?

That's correct.

@rudolf
Copy link
Contributor

rudolf commented Nov 17, 2020

I've added another section "Reasons to not use the saved objects client"

Here's some code references to existing code working around the limitations I've mentioned but felt like it bloats the issue description too much:

#82716

function ensureAggregationOnlyReturnsTaskObjects(opts: AggregationOpts): AggregationOpts {
const originalQuery = opts.query;
const filterToOnlyTasks = {
bool: {
filter: [{ term: { type: 'task' } }],
},
};
const query = originalQuery
? { bool: { must: [filterToOnlyTasks, originalQuery] } }
: filterToOnlyTasks;
return {
...opts,
query,
};
}

const resp = await callCluster('search', savedObjectCountSearchParams);
const buckets: Array<{ key: string; doc_count: number }> =
resp.aggregations?.types?.buckets || [];

https://github.com/elastic/kibana/blob/master/x-pack/plugins/security/server/session_management/session_index.test.ts#L487-L496

const { updated } = await this.updateByQuery(
asUpdateByQuery({
query: matchesClauses(
mustBeAllOf(
claimTasksById && claimTasksById.length
? asPinnedQuery(claimTasksById, queryForScheduledTasks)
: queryForScheduledTasks
),
filterDownBy(InactiveTasks)
),

@rudolf
Copy link
Contributor

rudolf commented Mar 25, 2021

Updated the issue now that saved objects supports paging through more than 10k saved objects. I kept the "There are too many saved-objects" section, but changed it to be about the scalability of migrations and export.

@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 6, 2021
@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed needs-team Issues missing a team label labels Apr 6, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

8 participants