-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] fix shadowing of field names by methods #13498
[query] fix shadowing of field names by methods #13498
Conversation
In particular, struct field names which clash with methods on `StructExpression`.
@danking This doesn't forbid clashing field names, but accessing the field must use the |
Hmm. This also seems fine. I'm gonna assign Chris for a third perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just need to fix the parts of the tests where we overwrote items as well.
Ugh, approx_cdf return a struct with a "values" field. No matter what we do to fix the shadowing issue is going to be a breaking change. |
Does |
The sampled values. The method is shadowed and inaccesible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests and a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this fixes the behavior to aggree with the existing docs on StructExpression
:
However, it is recommended to use square brackets to select fields:
>>> hl.eval(struct['a'])
5The latter syntax is safer, because fields that share their name with
an existing attribute of :class:.StructExpression
(keys
,values
,
annotate
,drop
, etc.) will only be accessible using the
:meth:.StructExpression.__getitem__
syntax. This is also the only way
to access fields that are not valid Python identifiers, like fields with
spaces or symbols.
super(StructExpression, s).__init__(x, t, indices, aggregations) | ||
s._warn_on_shadowed_name = set() | ||
s._fields = {} | ||
for k, v in fields.items(): | ||
s._set_field(k, v) | ||
super(StructExpression, s).__init__(x, t, indices, aggregations) | ||
return s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initializing the superclass first lets us catch shadowing superclass attributes
def __getattribute__(self, item): | ||
if item in super().__getattribute__('_warn_on_shadowed_name'): | ||
warning(f'Field {item} is shadowed by another method or attribute. ' | ||
f'Use ["{item}"] syntax to access the field.') | ||
self._warn_on_shadowed_name.remove(item) | ||
return super().__getattribute__(item) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__getattribute__
is always called on any attribute lookup, whereas __getattr__
is only called as a last resort when the attribute isn't found in any of the standard places. We need the first to catch uses of an attribute which is both a field name and an existing attribute.
if item in self.__dict__: | ||
return self.__dict__[item] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking in __dict__
was redundant. If the attribute is in __dict__
, the interpreter will find it and not call __getattr__
.
if key in self.__dict__ or hasattr(super(), key): | ||
self._warn_on_shadowed_name.add(key) | ||
else: | ||
self.__dict__[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching shadowed fields would be easier if we didn't add fields to __dict__
, but we need to do this to expose fields to IDEs for autocomplete.
Chris is out, I'll take this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent! Can you edit the PR message to include a CHANGELOG:
that describes this change to the scientists, including a note about the breaking change (which is OK b/c bug fix).
|
||
def _get_field(self, item): | ||
if item in self._fields: | ||
return self._fields[item] | ||
else: | ||
raise KeyError(get_nice_field_error(self, item)) | ||
|
||
def __getattribute__(self, item): | ||
if item in super().__getattribute__('_warn_on_shadowed_name'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids infinite recursion, right? And the super's getattribute is able to see the subclasses basic fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and yes. The super's getattribute is just what would have been called if we didn't override it (unless there's some special-method magic I don't understand).
In particular, struct field names which clash with methods on
StructExpression
.close #13495
CHANGELOG: Fix a bug where field names can shadow methods on the StructExpression class, e.g. "items", "keys", "values". Now the only way to access such fields is through the getitem syntax, e.g. "some_struct['items']". It's possible this could break existing code that uses such field names.