Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: versioned protobufs #32

Merged
merged 18 commits into from
Mar 7, 2023
Merged

feat: versioned protobufs #32

merged 18 commits into from
Mar 7, 2023

Conversation

tomasciccola
Copy link
Contributor

No description provided.

index.js Outdated Show resolved Hide resolved
@sethvincent
Copy link
Contributor

It looks like the tests are failing because schemasPrefix.js needs to be updated to use the { dataTypeId, schemaVersion } object.

@tomasciccola
Copy link
Contributor Author

It looks like the tests are failing because schemasPrefix.js needs to be updated to use the { dataTypeId, schemaVersion } object.

ooops, yeah, I forgot to commit that change...

]
.encode(record)
.finish()
const partial = ProtobufSchemas[key].fromPartial(record)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Using fromPartial to initialized lists to an empty list (since optional repeated doesn't exists), means initializing every non passed field to the initial default value, which means required fields that are missing don't get catched...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could catch this before it gets to the encode step. Is this only applicable to required arrays?

* added optional fields to filter and preset protobufs
* added field protobuf and schema
* type is optional on Preset_1 so need to be made optional on base type
  (and deleted before validation)
* schemaVersion is optional on Field_1 so need to be made optional on
  base type (and deleted before validation)
* this means utils.js/formatSchema{Key,Type} no takes optional values
}
}]
},
"type": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • in Field_1, type doesn't mean "which type of document am I?" but and enum of possible fields :|. This means Field_1 can't be validated yet (thinking about ways to avoid this issue...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make an issue for this and come back to it. In the short-term I think we might not need to validate Field but likely will eventually.

The only solution to this I can think of so far is renaming our type field for validation to something like schemaType so that it matches schemaVersion and is less likely to conflict with type used as a property name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can make that case for new schemas, but for old schemas the type still mostly refers to schemaType so we will still need to handle that manually

Tomás Ciccola added 2 commits February 22, 2023 15:24
Casing on a doc type shouldn't be handled specifically by the code. Nevertheless, some schemas enforce a type to a specific value (type must be i.e. 'observation', which is different from 'Observation'). This means that type "observation"  and "Observation" should have different dataTypeIds. That way we know what "type" to fill-in on returning the decoded object to be validated...
If we want the same `dataTypeId` for 'observation' and 'Observation'
(since we know we're actually talking about roughly the same type of
'document'), we could format the schemasPrefix object differently like:
Observation: `{dataTypeId: '924892', versions: {'observation': 4, 'Observation': 5}}`
or something less worse...
Observation: '71f85084cb94',
const schemasPrefix = {
Observation: { dataTypeId: '71f85084cb94', schemaVersions: [5] },
observation: { dataTypeId: '70f85084cb94', schemaVersions: [4] },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casing on a doc type shouldn't be handled specifically by the code. Nevertheless, some schemas enforce a type to a specific value (type must be i.e. 'observation', which is different from 'Observation'). This means that type "observation" and "Observation" should have different dataTypeIds. That way we know what "type" to fill-in on returning the decoded object to be validated...
If we want the same dataTypeId for 'observation' and 'Observation'
(since we know we're actually talking about roughly the same type of
'document'), we could format the schemasPrefix object differently like:
Observation: {dataTypeId: '924892', versions: {'observation': 4, 'Observation': 5}}
or something less worse...
@sethvincent @gmaclennan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah that's kinda tricky to have to check for both.

I think ideally it would stay lowercase but maybe that's awkward for code generation? I'm not quite familiar enough with the code generation to know if it could be a matter of capitalizing the first letter somewhere in the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, from the point of view of code generation, that would mean changing the schemas and how they are compiled, basically setting all 'type' fields to a capitalized version or lowercase version. There are some schemas that doesn't have a type field, or that the type field is used for something completely different, and we would need to handle that manually.
I don't think that would be a good approach.
In terms of complexity, the current way is simpler IMO.
I don't think there's a non-weird approach though...

Copy link
Contributor

@sethvincent sethvincent Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not sure why observation changed to Observation. If that's a needed change we should make sure other schemas follow the same pattern. coreOwnership would be CoreOwnership, filter to Filter for example.

What if we had something like this and handle the little o to big O within this module? (For example, transform lowercase o to capitol and match based on the schema version)

  Observation: { dataTypeId: '71f85084cb94', schemaVersions: [4, 5] },

I think keeping the same dataTypeId for the two schemas makes sense (in part because v5 is just an improvement on v4).

@sethvincent
Copy link
Contributor

This is looking good!

There's a set of potential problems around type naming and exporting that I've been thinking about today.

One issue I'm seeing is in cases where someone would import both the protobuf type and the jsonschema type in one file there might be naming conflicts.

It might be ideal to have a naming convention like:

  • CoreOwnershipJson
  • CoreOwnershipProtobuf

Kinda related: it looks like the protobuf comments export names like CoreOwnership_1 rather than CoreOwnership.

I have mixed feelings on whether the public types should include the version in the name. The main place that might matter is in migration code where it's needed to be explicit about versions and use multiple versions in the same code.

Ideally we could import latest version as just the name:

import { ObservationJson, ObservationProtobuf } from 'mapeo-schema/types'

As well as being more specific about the version needed:

import { ObservationJsonv4, ObservationJson5 } from 'mapeo-schema/types'

A type without a version would be the latest version. I can imagine there's a good argument for always including the version number and not having the convenience of importing the latest version by just the schema name.

Those examples also simplified importing the types from mapeo-schema/types.

This would be nice especially for our jsdoc typing where long paths get messy looking:

/**
* @property {import('mapeo-schema/types').ObservationJson} Observation
*/

versus:

/**
* @property {import('mapeo-schema/types/schema/observation/v5').Observation} Observation
*/

Importing types in this way would likely require generating an index.js/index.d.ts in the `types` directory and using the [`exports` field in package.json](https://nodejs.org/api/packages.html#package-entry-points) to map `/types` in the import string to the `./types/index.js` file. (I'm pretty sure this is how the above import examples could be achieved though haven't double checked)

I think the feedback in this comment could be broken up into issues that we tackle in follow-up prs so that the versioned protobufs work can be merged.

@tomasciccola
Copy link
Contributor Author

Kinda related: it looks like the protobuf comments export names like CoreOwnership_1 rather than CoreOwnership.

Protobufs are all part of the same package (mapeo), and it doesn't allow two Message types with the same name, thats why the version is encoded on the type. But we could do smth with code generation where we rename the export (maybe with a union of every version defaulting to the last one?)

@tomasciccola
Copy link
Contributor Author

@sethvincent do you want me to merge this already? I'll create additional issues to handling pending things as per your comments

@sethvincent
Copy link
Contributor

@tomasciccola sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants