Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SavedObjects] Advanced find API #84729

Open
pgayvallet opened this issue Dec 2, 2020 · 7 comments
Open

[SavedObjects] Advanced find API #84729

pgayvallet opened this issue Dec 2, 2020 · 7 comments
Labels
discuss Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 2, 2020

This issue was created following #64002 and the resulting sync discussion that occurred afterwise, and its goal is to discuss the 'advanced' usages of the find API (such as aggregation support) and how we should provide them.

For some context, see #64002, which introduces the bases of aggregation support for SO.

The root of the problem is that Saved Objects was designed to implement an abstraction layer over how saved objects are persisted in Elasticsearch. This means the same piece of data has a different shape / schema depending on where you look at it:

  • Saved Object Schema, the schema of a saved object returned by the saved object API e.g.
    {
       // Custom properties for the visualization saved object type
       "attributes" : {
         "title" : "A Visualization",
         "visState" : "{...}",
         "uiStateJSON" : "{...}",
         "description" : "",
         "version": 1
       },
       // Root properties present on all saved objects
       "type" : "visualization",
       "references" : [
         {
           "name" : "kibanaSavedObjectMeta.searchSourceJSON.index",
           "type" : "index-pattern",
           "id" : "800135f0-8429-11ea-98cc-13021164c023"
         }
       ],
       "namespace" : "marketing",
       "migrationVersion" : {
         "visualization" : "7.8.0"
       },
       "updated_at" : "2020-04-21T23:44:13.660Z"
    }
  • Persistence schema
    {
       // Custom properties for the visualization saved object type
       "visualization" : {
         "title" : "A Visualization",
         "visState" : "{...}",
         "uiStateJSON" : "{...}",
         "description" : "",
         "version": 1
       },
       // Root properties present on all saved objects
       "type" : "visualization",
       "references" : [
         {
           "name" : "kibanaSavedObjectMeta.searchSourceJSON.index",
           "type" : "index-pattern",
           "id" : "800135f0-8429-11ea-98cc-13021164c023"
         }
       ],
       "namespace" : "marketing",
       "migrationVersion" : {
         "visualization" : "7.8.0"
       },
       "updated_at" : "2020-04-21T23:44:13.660Z"
    }

The PR's current implementation is performing the same SO fields rewrite and validation for aggregations that we are currently doing for KQL query that translates between the SO schema field names and the persistence schema field names. For example, to aggregate on the version attribute of the visualization type, we would be using something like field: 'visualization.attributes.version' in the aggregation. Also, when trying to aggregate against an invalid field (a field not present on a given object), the agg validation will fail.

If at first glance, that can sound like the correct approach (and may be), the underlying implementation of this rewriting/validation is incredibly complex, as we need to know all the possible syntax of all the aggregations we need to support. This code is also very hard to read / maintain (that is, by nature, as pseudo-AST traversing will never be developer friendly)

With is why, after some (long) discussions with @rudolf and @XavierM, we started to wonder if such rewriting was really the best solution, and if just exposing the 'raw' aggs API (with eventually just basic validation such as script removal), wouldn't be a pragmatic option with only few downsides.

The biggest pro being, as already mentioned, the significant simplification of the code and the time we'll avoid to loose when maintaining or enhancing the feature.

After the sync discussion, the main cons were:

  • Inconsistency with the current API (KQL field syntax, mismatch with SO Schema)
  • Removing the SO 'abstraction' layer by exposing the Persistence schema in the SO find API, with the associated risks (the Persistence schema could become an 'official' part of the SO API as directly exposed, and the impossibility to change the Persistence schema implementation without introducing breaking changes to the API)

If the first point was considered acceptable, the second one is more bothersome.

The main questions at hand are:

  • should all the SO apis always have this field syntax abstraction level, or is it unrealistic / inappropriate when adding some advanced ES features support (such as aggs) to SOs
  • if we decide to 'leak' the Persistence schema in some APIs, is it alright to do so in the current find (or other) APIs, or should we try to create some kind of 'advanced' find API, that would have a lower abstraction level (and would ideally not directly be exposed to the client via HTTP endpoints)

cc @kobelb @XavierM

@pgayvallet pgayvallet added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects labels Dec 2, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor

rudolf commented Dec 2, 2020

I think the goal of providing a separate SO schema as an abstraction over the persistence schema was a good idea in principle. Saved objects are primarily used from the browser and queries are often embedded in UI components. This means the saved object schema abstraction is the only layer between the UI and the persistence mechanism. If we remove this layer it would become very hard to make any changes to how objects are persisted.

However, it requires re-writing fields in every Elasticsearch API that we want to expose. Because of the complexity of doing this, the saved objects API has always been severely limited leading to awkward work arounds or plugins deciding to use their own indices directly #80912

Even with the abstraction layer provided by Saved Objects there are still several problems with this architecture as discussed in #72028. The better alternative would be to stop using Saved Objects from the browser and for plugins to introduce their own HTTP API's. This API layer will provide an abstraction over the persistence and also allow plugins to perform complex business logic on the server instead.

Once there is a well tested HTTP API abstraction layer it would be less difficult and safer to make breaking changes to the persistence mechanism if this ever becomes a problem.

So although it will take time for plugins to migrate over to such an architecture I feel like long term an HTTP API abstraction layer is a better solution than an SO abstraction layer that severely limits the functionality ES provides.

We still have to decide how we will eventually fix the API consistency issues, to make it consistent we will have to remove the SO schema completely which would be a risky breaking change. The alternative is to introduce a new SO API which plugins can migrate over to, but will only be available server-side and this entire API will only expose the persistence schema. This will also provide some motivation for plugins to adopt this new architecture.

To minimize the complexity we could build the SO Schema API on top of the persistence schema API. Perhaps SavedObjectsClient -> SO Schema, SavedObjectsRepository -> Persistence Schema. My only hesitation is introducing additional complexity to saved objects (and spaces/security). Since we probably won't be able to remove the SavedObjectsClient before 9 we should be sure that this complexity is maintainable. But if the SavedObjectsClient just does translation to the SO Schema we have the potential to simplify the repository.

This would also solve the problem of the circular dependency between KQL and Core #55485 as well as the KQL performance issues #78348

@rudolf
Copy link
Contributor

rudolf commented Dec 2, 2020

The other alternative is to add an Elasticsearch alias like attribute.field for every field e.g. dashboard's mappings will become:

mappings: {
    properties: {
      description: { type: 'text' },
      'attributes.description': { type: 'alias', path: 'dashboard.description' },
      ...
    }
}

Apart from the field count increase which will require increasing the default limit there aren't too many disadvantages of this approach. But I feel like it maybe prevents us from pursuing the longer term vision of a multi-layer architecture. Also any persistence vs so schema abstraction we introduce with aliases can just as easily be used to make a breaking change to the persistence schema at a later point in time. So I feel like working around the risk of a breaking persistence change is not worth it.

@kobelb
Copy link
Contributor

kobelb commented Dec 2, 2020

/cc @legrego

@legrego
Copy link
Member

legrego commented Dec 4, 2020

Not to cherry-pick a single sentence, but I think this summarizes my feelings pretty well:

So although it will take time for plugins to migrate over to such an architecture I feel like long term an HTTP API abstraction layer is a better solution than an SO abstraction layer that severely limits the functionality ES provides.


Perhaps SavedObjectsClient -> SO Schema, SavedObjectsRepository -> Persistence Schema. My only hesitation is introducing additional complexity to saved objects (and spaces/security). Since we probably won't be able to remove the SavedObjectsClient before 9 we should be sure that this complexity is maintainable. But if the SavedObjectsClient just does translation to the SO Schema we have the potential to simplify the repository.

Having the SavedObjectsClient perform the translation makes sense to me. This could potentially be a post-processing step that happens after spaces, security, and encrypted saved objects have ran through their logic. That way we could adapt our wrappers to work against the persistence schema directly, and we wouldn't have to maintain two implementations for each of these wrappers.

@pgayvallet
Copy link
Contributor Author

To minimize the complexity we could build the SO Schema API on top of the persistence schema API. Perhaps SavedObjectsClient -> SO Schema, SavedObjectsRepository

I think that would make sense. However, I'm not sure to see how this would solve the issue encounter for aggs for example? We would still need to code / maintain the transformation and validation, it would just move it to the client instead of the repository. Or would the client only have a subset of the features / apis of the repository in that case (like, you can't perform aggs directly using the client?)

@rudolf
Copy link
Contributor

rudolf commented Jan 7, 2021

Or would the client only have a subset of the features / apis of the repository in that case (like, you can't perform aggs directly using the client?)

Yes. I think we would deprecate the SavedObjectsClient and the auto-generated API's. We probably won't be able to remove it before 9.0 or maybe never, but the recommended way to consume saved objects would become to create a custom API and use the SavedObjectsRepository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

5 participants