-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRIVERS-2926] [PYTHON-4577] BSON Binary Vector Subtype Support #1813
base: master
Are you sure you want to change the base?
[DRIVERS-2926] [PYTHON-4577] BSON Binary Vector Subtype Support #1813
Conversation
…t.test_bson.TestBSON.test_encode_type_marker to pass
…t.test_bson.TestBSON.test_encode_type_marker to pass
…ector, from_vector
e32d1b8
to
26b8398
Compare
Hi @shane. Do you still have changes that haven't been addressed? |
"dtype_alias": "INT8", | ||
"padding": 0, | ||
"canonical_bson": "1600000005766563746F7200040000000903007F0700", | ||
"canonical_extjson": "{\"vector\": {\"$binary\": {\"base64\": \"AwB/Bw==\", \"subType\": \"09\"}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to test JSON encoding here since that is well-tested elsewhere in the bson corpus
"vector": [127, 7], | ||
"dtype_hex": "0x03", | ||
"dtype_alias": "INT8", | ||
"padding": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear why padding needs to be specified as an expectation. Isn't that just part of the encoding? Maybe if you add some tests where padding is non-zero it will be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not required. There are now tests that include non-zero padding (both invlaid and valid ones).
"description": "Simple Vector INT8", | ||
"valid": true, | ||
"vector": [127, 7], | ||
"dtype_hex": "0x03", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear why expected hex value of dtype needs to be specified separately from the expected BSON encoding of the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the dtype in addition to the vector/numbers to encode the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to distinguish, for example, between int8, packed_bit, or even float32.
"tests": [ | ||
{ | ||
"description": "Simple Vector INT8", | ||
"valid": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the invalid tests are separated in a different array, you can probably remove this key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's out of date.
{ | ||
"description": "Basic Tests of Binary Vectors, subtype 9", | ||
"test_key": "vector", | ||
"tests": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the rest of the BSON corpus and name this field "valid".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format now follows one of the standards I saw.
"canonical_extjson": "{\"vector\": {\"$binary\": {\"base64\": \"JwAAAID/AAAAAAAAgH8=\", \"subType\": \"09\"}}}" | ||
} | ||
], | ||
"invalid": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be different categories of invalid vectors? Note that in the BSON corpus we have both "decodeErrors" and "parseErrors".
Adds support for new subtype of Binary BSON subtype to be used for densely-packed 1d arrays (vectors).
Corresponding changes are in specifications repo PR #1658