Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] fix shadowing of field names by methods #13498

Merged
merged 8 commits into from
Sep 7, 2023

Conversation

patrick-schultz
Copy link
Collaborator

@patrick-schultz patrick-schultz commented Aug 25, 2023

In particular, struct field names which clash with methods on StructExpression.

close #13495

CHANGELOG: Fix a bug where field names can shadow methods on the StructExpression class, e.g. "items", "keys", "values". Now the only way to access such fields is through the getitem syntax, e.g. "some_struct['items']". It's possible this could break existing code that uses such field names.

In particular, struct field names which clash with methods on `StructExpression`.
@patrick-schultz
Copy link
Collaborator Author

@danking This doesn't forbid clashing field names, but accessing the field must use the struct['items'] syntax. This appears to be what we meant to be doing, but we had a bug. Open to discussion on whether we should just prevent the clashing entirely (but what happens when importing data with a bad field name?).

@danking
Copy link
Contributor

danking commented Aug 25, 2023

Hmm. This also seems fine. I'm gonna assign Chris for a third perspective.

@danking danking assigned chrisvittal and unassigned danking Aug 25, 2023
chrisvittal
chrisvittal previously approved these changes Aug 25, 2023
Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just need to fix the parts of the tests where we overwrote items as well.

@patrick-schultz
Copy link
Collaborator Author

Ugh, approx_cdf return a struct with a "values" field. No matter what we do to fix the shadowing issue is going to be a breaking change.

@danking
Copy link
Contributor

danking commented Aug 25, 2023

Does s.values currently give you the sampled values or the .values()?

@patrick-schultz
Copy link
Collaborator Author

The sampled values. The method is shadowed and inaccesible.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests and a warning

@chrisvittal chrisvittal self-requested a review August 31, 2023 17:40
@chrisvittal chrisvittal dismissed their stale review August 31, 2023 17:41

need new changes

Copy link
Collaborator Author

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this fixes the behavior to aggree with the existing docs on StructExpression:

However, it is recommended to use square brackets to select fields:
>>> hl.eval(struct['a'])
5

The latter syntax is safer, because fields that share their name with
an existing attribute of :class:.StructExpression (keys, values,
annotate, drop, etc.) will only be accessible using the
:meth:.StructExpression.__getitem__ syntax
. This is also the only way
to access fields that are not valid Python identifiers, like fields with
spaces or symbols.

Comment on lines +1746 to 1751
super(StructExpression, s).__init__(x, t, indices, aggregations)
s._warn_on_shadowed_name = set()
s._fields = {}
for k, v in fields.items():
s._set_field(k, v)
super(StructExpression, s).__init__(x, t, indices, aggregations)
return s
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing the superclass first lets us catch shadowing superclass attributes

Comment on lines +1798 to +1804
def __getattribute__(self, item):
if item in super().__getattribute__('_warn_on_shadowed_name'):
warning(f'Field {item} is shadowed by another method or attribute. '
f'Use ["{item}"] syntax to access the field.')
self._warn_on_shadowed_name.remove(item)
return super().__getattribute__(item)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__getattribute__ is always called on any attribute lookup, whereas __getattr__ is only called as a last resort when the attribute isn't found in any of the standard places. We need the first to catch uses of an attribute which is both a field name and an existing attribute.

Comment on lines -1792 to -1793
if item in self.__dict__:
return self.__dict__[item]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking in __dict__ was redundant. If the attribute is in __dict__, the interpreter will find it and not call __getattr__.

if key in self.__dict__ or hasattr(super(), key):
self._warn_on_shadowed_name.add(key)
else:
self.__dict__[key] = value
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching shadowed fields would be easier if we didn't add fields to __dict__, but we need to do this to expose fields to IDEs for autocomplete.

@danking danking assigned danking and unassigned chrisvittal Sep 6, 2023
@danking
Copy link
Contributor

danking commented Sep 6, 2023

Chris is out, I'll take this.

danking
danking previously requested changes Sep 6, 2023
Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent! Can you edit the PR message to include a CHANGELOG: that describes this change to the scientists, including a note about the breaking change (which is OK b/c bug fix).


def _get_field(self, item):
if item in self._fields:
return self._fields[item]
else:
raise KeyError(get_nice_field_error(self, item))

def __getattribute__(self, item):
if item in super().__getattribute__('_warn_on_shadowed_name'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids infinite recursion, right? And the super's getattribute is able to see the subclasses basic fields?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and yes. The super's getattribute is just what would have been called if we didn't override it (unless there's some special-method magic I don't understand).

@danking danking merged commit c160d3e into hail-is:main Sep 7, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[query] do not allow the creation of a field which classes with a necessary builtin
3 participants