[PROPOSAL] Evolving the API Spec repo #80

nhtruong · 2023-02-23T21:59:32Z

This repo was created to define accurate specs for OpenSearch APIs, and from there, generate OpenSearch clients in different languages. This will keep all clients in sync, and it will let us deploy changes in the API to all clients. With that in mind, there are 3 goals:

Usability: The resulting spec must have enough information to generate proper OpenSearch clients from scratch, and to incrementally add new features to existing clients. A proper OpenSearch client is NOT simply any client with a method for every OpenSearch endpoint defined in the spec. Such a client must also adhere to the patterns and conventions followed by the existing clients, in order to work in an already-written application.
Backward-Compatibility: When we generate an already-existing endpoint in the client, it should not be a breaking change for the users. The generated methods might have different implementation details but they must retain the same signatures, error messages, and functionality of their hand-written counterparts.
Maintainability: On top of defining new endpoints and operations for OpenSearch, we also need to backfill 308 existing operations across 208 endpoints. This is quite an undertaking, and we aim to make such a job as painless and error-free as possible.

As explained further below, I propose the following four-stage process that, on its surface may appear complex, but which eliminates unnecessary effort and achieves the above goals:

Convert the legacy spec to Smithy
Use mixins to augment the Smithy specs with missing parameter and response definitions
Generate OpenAPI specs from the Smithy specs
Generate OpenSearch clients from the OpenAPI specs using custom generators

The Challenges

The OpenSearch API does not follow REST API constraints and conventions

In REST API, each endpoint represents a resource in a hierarchy of resources, and the HTTP verbs are mapped to CRUD actions to interact with said resource. Hypothetically, to create or update an OpenSearch document we would have the following endpoints and operations in the classic REST API fashion:

Create: POST /indexes/{index}/docs
Update/Upsert: PUT /indexes/{index}/docs/{id}

However, we currently have the following document operations on OpenSearch:

Create :
- POST /{index}/_doc
- POST /{index}/_create/{id}
- PUT /{index}/_create/{id}
Update:
- POST /{index}/_update/{id}
Upsert:
- POST /{index}/_doc/{id}
- PUT /{index}/_doc/{id}

As you can see, OpenSearch uses different endpoints to interact with the same resource, a document in this example, (instead of one endpoint with different HTTP verbs/operations). Moreover, different HTTP verbs like, PUT and POST, have the exact same meaning on the same endpoints.

This causes issues for off-the-shell client generators:

They will generate methods with misleading names. POST /{index}/_update/{id}, for example, will be translated to a method named post_index_update_id. That is confusing! Without reading the description, it’s impossible to tell if the method will create a new document (post) or modify an existing document (update).
They will generate duplicate methods with different names but have identical signature and functionality.
Some generators might attempt to group the methods by the resource they interact with. But in the case of the document methods, they will be put in different groups due to their different URL paths.

Namespace and the grouping of operations by functionality

OpenSearch API operations are grouped into API methods by functionality, and these methods are then grouped into namespaces. Consider the following operations:

GET /_cat/shards
GET /_cat/shards/{index}

In OpenSearch clients, these two operations are combined into one method called shards where {index} is an optional path parameter. The shards method belongs to a namespace called cat. And to invoke this method: client.cat.shards(index: 'books')

This operation grouping is also reflected in Elasticsearch’s legacy spec. Take a look at this excerpt of the legacy index.json spec file:

{
  "index":{
    "documentation":{
      "url":"https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-index_.html",
      "description":"Creates or updates a document in an index."
    },
    "stability":"stable",
    "url":{
      "paths":[
        {
          "path":"/{index}/_doc/{id}",
          "methods":[
            "PUT",
            "POST"
          ],
          "parts":{
            "id":{
              "type":"string",
              "description":"Document ID"
            },
            "index":{
              "type":"string",
              "description":"The name of the index"
            }
          }
        },
        {
          "path":"/{index}/_doc",
          "methods":[
            "POST"
          ],
          "parts":{
            "index":{
              "type":"string",
              "description":"The name of the index"
            }
          }
        }
      ]
    },
    "params":{...},
    "body":{
      "description":"The document",
      "required":true
    }
  }
}

This spec describes 3 different operations in 2 different URL paths, but they are considered one action with shared description, query parameters, path parameters (where {index} is required but {id} is optional), and request body.

For more information on this topic, check out this issue.

Solutions

Generate Clients from OpenAPI Spec

We have had a long discussion on whether to generate the clients from OpenAPI or Smithy Spec. The consensus is that OpenAPI is an industry standard. Many developers are familiar with OpenAPI and have experience working with it. OpenAPI is well supported by most IDEs, with a large set of tools, especially Swagger, built for it over the years.

The generators for OpenAPI spec can be reused to generate clients for OpenSearch Extensions. Many of these extensions will be developed by independent teams who will likely use OpenAPI to describe their APIs. This is one reason not to simply use the legacy spec to generate client code, even though the spec was built around unique traits of OpenSearch API.

One thing to keep in mind: Any generator we write must have the ability to combine many operations into one action. It’s not as simple as translating the spec of an operation to an API method. We have already written a prototype generator that can handle a few types of operation combinations (Get-Get, Post-Put, Post-Get, ...) in Ruby.

While both OpenAPI and Smithy offer off-the-shelf generators that can generate brand new clients with the push of a button, they are meant for conventional REST APIs, which is not the case with OpenSearch, as explained above. The API methods generated from such generators will be very confusing to use, and cannot be retrofitted into existing clients. We will have to write our own generators regardless. So, neither Smithy nor OpenAPI has the advantage over the other in this regard.

Use Smithy as a tool to write OpenAPI Spec

Even with tools that help navigate and edit OpenAPI docs, writing specs for over 300 operations in JSON or YAML is still a daunting task. Moreover, since OpenAPI does not allow reference by operation, OpenSearch operations serving the same purpose are scattered in different places and get mixed up with other operations.

Smithy, on the other hand, comes with features that make it easier to maintain an API spec as complex as the OpenSearch API:

The mixin feature makes it easier to reuse assets, especially the set of parameters shared between operations to be combined in the same method.
The lack of restrictions on references allows us to group definitions of similar operations (like the three index operations) into the same location. This will come in handy when we backfill the intricate schemas of request and response bodies by hand later.

With that being said, we will only use Smithy as a tool to write OpenAPI spec. That means:

We won’t use Smithy features that are not translatable to OpenAPI.
Any update to the Smithy spec must always be reflected in the OpenAPI spec.
Smithy can be dropped in the future if necessary, allowing contributors to update to the OpenAPI spec directly rather than through the Smithy spec.

Implementation Details

Fill in missing meta data for client generators

The core OpenAPI does not support namespace and operation grouping. We can address this by adding our own OpenAPI extension to include this info in each API operation. For example:

x-operation-group: cat.shards
x-operation-group: index

And to support incremental client generation (where we only need the client generators to generate code for new operations), we will add extensions delineating when an operation was added, deprecated, and removed:

x-version-added: 0.7
x-version-deprecated: 1.5
x-version-removed: 2.0

Smithy’s custom traits are the equivalent of OpenAPI extensions. At the time of this writing, the native Smithy-to-OpenAPI converter does not translate Smithy traits to OpenAPI extensions. However, we have a solution for this, and we’re working on upstreaming this change to the converter.

Backfill existing operations

Translate the legacy spec to Smithy models: We have already translated thelegacy spec to OpenAPI spec. The experience gained from that translation will help us quickly translate it to Smithy, too. This will save us time, and reduce potential human errors backfilling 308 operations by hand.
Add schemas for response and request bodies: While most clients treat the bodies as generic JSON objects and do not need this info, the Java and future TypeScript clients require proper schemas for these entities. This is another reason why we chose Smithy to draft the spec over editing OpenAPI directly. Smithy’s cleaner format, and more flexible file structure, along with mixins, will make backfilling these complex data structures by hand less painful.

Github Workflows

Translate Smithy Spec to OpenAPI spec for every commit.
Validate the OpenAPI spec to assure that each operation has the extensions mentioned above.
Validate the OpenAPI spec to assure that operations that are grouped together have identical description, parameters, and request/response bodies.
Publish a new version of the OpenAPI spec to the repo’s Github page when it’s released.

For more details, check out the related proposal.

Conclusion

The four-stage process addresses the many quirks of the OpenSearch API. It allows us to rapidly flesh out the API definition using the legacy spec. It allows us to fill in the missing pieces efficiently using Smithy, which is a well-factored way of maintaining specs. By converting the Smithy spec to OpenAPI, we can take advantage of the OpenAPI ecosystem and keep the door open to new developers to contribute to clients.

I welcome feedback on this proposal.

The text was updated successfully, but these errors were encountered:

dblock · 2023-05-01T17:50:10Z

I'm late to this proposal, would only say that "Translate Smithy Spec to OpenAPI spec for every commit." sounds like a "build and release" to me. Maybe it's helpful to think of it that way?

nhtruong · 2024-04-02T18:49:02Z

With the move toward native openAPI, the only relevant piece is validating the OpenAPI spec, which I'm working on right now.

dblock · 2024-04-02T22:00:49Z

Is validating and linting the same thing? If so this issue can be closed to avoid duplicating with #22?

nhtruong · 2024-04-02T22:04:54Z

Yes, they are the same. Closing.

github-actions bot added the untriaged label Feb 23, 2023

nhtruong removed the untriaged label Feb 23, 2023

nhtruong mentioned this issue Mar 16, 2023

Generate missing models from legacy spec #82

Closed

VachaShah mentioned this issue Apr 6, 2023

OpenSearch Spec and Client Generation Roadmap opensearch-project/opensearch-clients#58

Closed

17 tasks

dblock mentioned this issue Apr 6, 2023

[PROPOSAL] Add support for API versioning #84

Open

dblock added the enhancement New feature or request label Dec 20, 2023

nhtruong mentioned this issue Feb 8, 2024

[PROPOSAL] Replace Smithy with a native OpenAPI spec #189

Closed

nhtruong closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Evolving the API Spec repo #80

[PROPOSAL] Evolving the API Spec repo #80

nhtruong commented Feb 23, 2023 •

edited

Loading

dblock commented May 1, 2023

nhtruong commented Apr 2, 2024

dblock commented Apr 2, 2024

nhtruong commented Apr 2, 2024

[PROPOSAL] Evolving the API Spec repo #80

[PROPOSAL] Evolving the API Spec repo #80

Comments

nhtruong commented Feb 23, 2023 • edited Loading

The Challenges

The OpenSearch API does not follow REST API constraints and conventions

Namespace and the grouping of operations by functionality

Solutions

Generate Clients from OpenAPI Spec

Use Smithy as a tool to write OpenAPI Spec

Implementation Details

Fill in missing meta data for client generators

Backfill existing operations

Github Workflows

Conclusion

dblock commented May 1, 2023

nhtruong commented Apr 2, 2024

dblock commented Apr 2, 2024

nhtruong commented Apr 2, 2024

nhtruong commented Feb 23, 2023 •

edited

Loading