Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong parsing of repeated structured properties #757

Open
nikita-davydov opened this issue Feb 18, 2022 · 1 comment
Open

Wrong parsing of repeated structured properties #757

nikita-davydov opened this issue Feb 18, 2022 · 1 comment
Labels
api: datastore Issues related to the googleapis/python-ndb API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nikita-davydov
Copy link
Contributor

nikita-davydov commented Feb 18, 2022

Short-reproducible example

from google.cloud import ndb


client = ndb.Client()


class ModelD(ndb.Model):
    field_d = ndb.TextProperty()


class ModelC(ndb.Model):
    field_c_1 = ndb.StringProperty()
    field_c_2 = ndb.TextProperty()
    field_c_3 = ndb.TextProperty()
    field_c_4 = ndb.TextProperty()


class ModelB(ndb.Model):
    field_b_1 = ndb.StringProperty()
    field_b_2 = ndb.StructuredProperty(ModelC)


class ModelA(ndb.Model):
    field_a_1 = ndb.StringProperty()
    field_a_2 = ndb.StructuredProperty(ModelD)
    field_a_3 = ndb.StructuredProperty(ModelB, repeated=True)


def load_data():
    with client.context():
        model_Cs = [
            ModelC(
                field_c_1='value_c_1_1',
                field_c_2='value_c_2_1',
                field_c_3='value_c_3_1',
                field_c_4='value_c_4_1'
            ),
            ModelC(
                field_c_1='value_c_1_2',
                field_c_2='value_c_2_2',
                field_c_3='value_c_3_2',
                field_c_4='value_c_4_2'
            )
        ]

        model_Bs = [
            ModelB(field_b_1='b1', field_b_2=model_Cs[0]),
            ModelB(field_b_1='b2'),  # NULL VALUE FOR field_b_2, This is the issue!
            ModelB(field_b_1='b3', field_b_2=model_Cs[0])
        ]

        model_a = ModelA(field_a_1='a1', field_a_2=ModelD(field_d='d1'), field_a_3=model_Bs)
        model_a.put()
    return model_a.key.id()


def assert_bug():
    id_ = load_data()
    number_of_assertions_errors = 0
    number_of_kind_errors = 0
    for i in range(30):
        try:
            with client.context():
                result = ModelA.get_by_id(id_, use_cache=False, use_global_cache=False)
                result1 = ModelA.get_by_id(id_, use_cache=False, use_global_cache=False)
                assert result == result1
        except AssertionError:
            number_of_assertions_errors += 1
        except ndb.KindError:
            number_of_kind_errors += 1

    print(f'Number of assertions errors {number_of_assertions_errors}')
    print(f'Number of kind errors {number_of_kind_errors}')


if __name__ == '__main__':
    assert_bug()

Example of prints, they are different time to time

Number of assertions errors 13
Number of kind errors 16
Number of assertions errors 11
Number of kind errors 17

The case is that in raw datastore response it returns
e.g.
{'field_a_3.field_b_2': [None], 'field_a_3.field_b_2.field_c_1': ['value_c_1_1', 'value_c_1_2']}

And it confuses parser time to time and it raises KindError, sometimes parser could done his job but it parses not all values, so some of fields of ModelC could be with null values, but in the DB they're not null

Seems like the fix is to sort keys that we receive from google cloud datastore API

@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label Feb 18, 2022
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Feb 19, 2022
@meredithslota meredithslota added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. 🚨 This issue needs some love. labels Feb 26, 2022
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels May 27, 2022
@aetherknight
Copy link

aetherknight commented Dec 1, 2022

I have run into this issue too with a similarly complex entity with structured properties. I first ran into it due to the KindError, as well. I patched _entity_from_ds_entity to sort the fields of the dt_entity before iterating over them, and that seemed to prevent the KindError. (the datastore Entity type is based on dict, and in Python 2.7 dict is not order preserving, implying there are some odd cases/bad assumptions within _entity_from_ds_entity or the code it calls).

However, there is a deeper problem with the legacy data format in the case discussed here. In the example above, model_Bs[1] has field_b_2 set to None. This causes the old dot-notation representation to lose information about the correct order of the deeply nested (and repeated) fields.

Taking part of the construction of the example above:

 model_Bs = [  # stored as a repeated StructuredProperty
            ModelB(field_b_1='b1', field_b_2=model_Cs[0]),
            ModelB(field_b_1='b2'),  # NULL VALUE FOR field_b_2, This is the issue!
            ModelB(field_b_1='b3', field_b_2=model_Cs[1])
        ]

When the entity is re-fetch/get later, these will instead be loaded as:

 model_Bs = [  # stored as a repeated StructuredProperty
            ModelB(field_b_1='b1', field_b_2=model_Cs[0]),
            ModelB(field_b_1='b2', field_b_2=model_Cs[1])
            ModelB(field_b_1='b3', field_b_2=None),  # There were only 2 values in the various `field_b_2.*` fields
        ]

I believe this issue existed in the legacy NDB as well.

So far, my only plan to work around this is to reconstruct the original data, but with legacy_data=False, so that NDB will store the StructuredProperties as embeded entities, instead of trying to flatten them into the dot-notation lists: https://cloud.google.com/datastore/docs/concepts/entities#embedded_entity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-ndb API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants