[query] fix shadowing of field names by methods #13498

patrick-schultz · 2023-08-25T14:16:22Z

In particular, struct field names which clash with methods on StructExpression.

CHANGELOG: Fix a bug where field names can shadow methods on the StructExpression class, e.g. "items", "keys", "values". Now the only way to access such fields is through the getitem syntax, e.g. "some_struct['items']". It's possible this could break existing code that uses such field names.

In particular, struct field names which clash with methods on `StructExpression`.

patrick-schultz · 2023-08-25T14:18:56Z

@danking This doesn't forbid clashing field names, but accessing the field must use the struct['items'] syntax. This appears to be what we meant to be doing, but we had a bug. Open to discussion on whether we should just prevent the clashing entirely (but what happens when importing data with a bad field name?).

danking · 2023-08-25T15:37:28Z

Hmm. This also seems fine. I'm gonna assign Chris for a third perspective.

chrisvittal

LGTM. Just need to fix the parts of the tests where we overwrote items as well.

patrick-schultz · 2023-08-25T17:01:33Z

Ugh, approx_cdf return a struct with a "values" field. No matter what we do to fix the shadowing issue is going to be a breaking change.

danking · 2023-08-25T17:46:28Z

Does s.values currently give you the sampled values or the .values()?

patrick-schultz · 2023-08-25T19:10:35Z

The sampled values. The method is shadowed and inaccesible.

danking

tests and a warning

need new changes

ready

patrick-schultz

Note that this fixes the behavior to aggree with the existing docs on StructExpression:

However, it is recommended to use square brackets to select fields:
>>> hl.eval(struct['a'])
5

The latter syntax is safer, because fields that share their name with
an existing attribute of :class:.StructExpression (keys, values,
annotate, drop, etc.) will only be accessible using the
:meth:.StructExpression.__getitem__ syntax. This is also the only way
to access fields that are not valid Python identifiers, like fields with
spaces or symbols.

patrick-schultz · 2023-08-31T19:51:49Z

hail/python/hail/expr/expressions/typed_expressions.py

+        super(StructExpression, s).__init__(x, t, indices, aggregations)
+        s._warn_on_shadowed_name = set()
        s._fields = {}
        for k, v in fields.items():
            s._set_field(k, v)
-        super(StructExpression, s).__init__(x, t, indices, aggregations)
        return s


Initializing the superclass first lets us catch shadowing superclass attributes

patrick-schultz · 2023-08-31T19:54:17Z

hail/python/hail/expr/expressions/typed_expressions.py

+    def __getattribute__(self, item):
+        if item in super().__getattribute__('_warn_on_shadowed_name'):
+            warning(f'Field {item} is shadowed by another method or attribute. '
+                    f'Use ["{item}"] syntax to access the field.')
+            self._warn_on_shadowed_name.remove(item)
+        return super().__getattribute__(item)
+


__getattribute__ is always called on any attribute lookup, whereas __getattr__ is only called as a last resort when the attribute isn't found in any of the standard places. We need the first to catch uses of an attribute which is both a field name and an existing attribute.

patrick-schultz · 2023-08-31T19:55:07Z

hail/python/hail/expr/expressions/typed_expressions.py

-        if item in self.__dict__:
-            return self.__dict__[item]


Looking in __dict__ was redundant. If the attribute is in __dict__, the interpreter will find it and not call __getattr__.

patrick-schultz · 2023-08-31T19:56:19Z

hail/python/hail/expr/expressions/typed_expressions.py

+            if key in self.__dict__ or hasattr(super(), key):
+                self._warn_on_shadowed_name.add(key)
+            else:
+                self.__dict__[key] = value


Catching shadowed fields would be easier if we didn't add fields to __dict__, but we need to do this to expose fields to IDEs for autocomplete.

danking · 2023-09-06T15:00:25Z

Chris is out, I'll take this.

danking

This is excellent! Can you edit the PR message to include a CHANGELOG: that describes this change to the scientists, including a note about the breaking change (which is OK b/c bug fix).

danking · 2023-09-06T16:07:33Z

hail/python/hail/expr/expressions/typed_expressions.py


    def _get_field(self, item):
        if item in self._fields:
            return self._fields[item]
        else:
            raise KeyError(get_nice_field_error(self, item))

+    def __getattribute__(self, item):
+        if item in super().__getattribute__('_warn_on_shadowed_name'):


This avoids infinite recursion, right? And the super's getattribute is able to see the subclasses basic fields?

Yes, and yes. The super's getattribute is just what would have been called if we didn't override it (unless there's some special-method magic I don't understand).

done

[query] fix shadowing of field names by methods

5200f6c

In particular, struct field names which clash with methods on `StructExpression`.

patrick-schultz assigned danking Aug 25, 2023

danking assigned chrisvittal and unassigned danking Aug 25, 2023

chrisvittal previously approved these changes Aug 25, 2023

View reviewed changes

danking previously requested changes Aug 31, 2023

View reviewed changes

chrisvittal self-requested a review August 31, 2023 17:40

patrick-schultz added 7 commits August 31, 2023 13:58

better fix, with warning

4438b2f

fix uses of approx_cdf.values

205ca34

need to put fields in __dict__ for IDEs

dc94806

add test

7b686cb

fix slow struct construction

44d0a30

expand test, fix _from_fields

f23c0e8

track warning per field

deb48c8

patrick-schultz commented Aug 31, 2023

View reviewed changes

danking assigned danking and unassigned chrisvittal Sep 6, 2023

danking previously requested changes Sep 6, 2023

View reviewed changes

danking approved these changes Sep 7, 2023

View reviewed changes

danking merged commit c160d3e into hail-is:main Sep 7, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] fix shadowing of field names by methods #13498

[query] fix shadowing of field names by methods #13498

patrick-schultz commented Aug 25, 2023 •

edited

Loading

patrick-schultz commented Aug 25, 2023

danking commented Aug 25, 2023

chrisvittal left a comment

patrick-schultz commented Aug 25, 2023

danking commented Aug 25, 2023

patrick-schultz commented Aug 25, 2023

danking left a comment

patrick-schultz left a comment

patrick-schultz Aug 31, 2023

patrick-schultz Aug 31, 2023

patrick-schultz Aug 31, 2023

patrick-schultz Aug 31, 2023

danking commented Sep 6, 2023

danking left a comment

danking Sep 6, 2023

patrick-schultz Sep 6, 2023

[query] fix shadowing of field names by methods #13498

[query] fix shadowing of field names by methods #13498

Conversation

patrick-schultz commented Aug 25, 2023 • edited Loading

patrick-schultz commented Aug 25, 2023

danking commented Aug 25, 2023

chrisvittal left a comment

Choose a reason for hiding this comment

patrick-schultz commented Aug 25, 2023

danking commented Aug 25, 2023

patrick-schultz commented Aug 25, 2023

danking left a comment

Choose a reason for hiding this comment

patrick-schultz left a comment

Choose a reason for hiding this comment

patrick-schultz Aug 31, 2023

Choose a reason for hiding this comment

patrick-schultz Aug 31, 2023

Choose a reason for hiding this comment

patrick-schultz Aug 31, 2023

Choose a reason for hiding this comment

patrick-schultz Aug 31, 2023

Choose a reason for hiding this comment

danking commented Sep 6, 2023

danking left a comment

Choose a reason for hiding this comment

danking Sep 6, 2023

Choose a reason for hiding this comment

patrick-schultz Sep 6, 2023

Choose a reason for hiding this comment

patrick-schultz commented Aug 25, 2023 •

edited

Loading