Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30439: Refactor DataCoordinate #769

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

DM-30439: Refactor DataCoordinate #769

wants to merge 15 commits into from

Conversation

timj
Copy link
Member

@timj timj commented Jan 12, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@timj
Copy link
Member Author

timj commented Jan 12, 2023

I know this needs work to deal with conflicts, created as a draft PR to make it easier to examine the changes.

@timj timj force-pushed the tickets/DM-30439 branch 2 times, most recently from acdf262 to fe6fe36 Compare January 12, 2023 16:26
@TallJimbo
Copy link
Member

👍 Note that I definitely want to revisit some of the code here before merging it - my thinking on the custom containers has evolved quite a bit since I wrote those here. RFC-834 is in play for the DataCoordinate containers, and I think storage class conversions sort of blow up the whole concept of name-keyed DatasetRef containers, since we cannot assume that dataset type name equality means dataset type equality in many contexts anymore.

@timj timj force-pushed the tickets/DM-30439 branch 3 times, most recently from ac6121c to ac82d70 Compare January 12, 2023 16:58
In all of the PEP-544 examples, Protocols appear last, and the original
order confuses pylance.
This starts a new _containers subpackage that will be extended to
include other butler primitives, like DimensionRecords and DatasetRefs.
This renames some existing classes, so it is a breaking API change,
but one whose effects should be minimal to nonexistent outside
daf_butler.  Overall it aligns the hierarchy better with the built-in
typing hierarchy, injecting some interfaces that had implementations
but no ABCs, and it cleans up the handling of mutability by separating
frozenset from set views.
We've historically treated DimensionRecord lookup failures as a
non-error (we just don't return records we didn't find), at least
mostly because we allow implied dimension dependencies to be None
(e.g. an exposure need not have a physical_filter).  But that's
actually broken because of DM-21840, and I now think the right
solution there is to just make those foreign keys NOT NULL after all.

That's a long way of saying that I think we could _probably_ start
treating DimensionRecord lookup failures as errors, and simplify a lot
of code in the process (though a lot of test would break).  I'm not
confident enough to try that now and I don't want to deal with fixing
the tests, but I am confident enough to say that lookup failures are
so rare that we *can* simplify by not caching them, and that's what
this commit does.

This is relevant because I'm about to reimplement the caching entirely
in terms of the new DimensionRecord containers, and I don't want those
to have to worry about caching failed lookups.  But I wanted to make
the (slight) behavioral change a separate commit (this one) from the
refactoring one.
This is now being used as a callback in the new cache object, and for
consistent error-handling there we need these to behave more
consistently.
This was a test for some cleverness that avoids a copy, but I'm not
sure all of that cleverness is really worth its weight, and I don't
think we should be testing something the user is not supposed to care
about.
This adds a new method to Registry and SqlRegistry without a
RemoteRegistry implementation, and that's why it isn't decorated with
abstractmethod yet.  I'll fix that after I make the DimensionRecord
containers serializable.

This also adds a check to DataCoordinate.standardize that any passed
keys are actually dimension names; this addresses a long-standing
problem where users would get a keyword argument to some higher-level
API wrong, and it would be forwarded as **kwargs down to
DataCoordinate.standardize and then silently ignored.  And it turns
out we were doing that even in our own test utility code!
This reimplements some of the special handling of non-standard keys
from Butler._findDatasetRef in the hopes of being able to move it all
down to Registry (and thus work on many more interfaces).  But it's
just a start at that; I realized while trying to make
Butler._findDatasetRef use the new code that we really need to make
queryDatasets work on CALIBRATION collections first.  But I think what
I've done so far will still be useful eventually, so I'm keeping it.
@timj
Copy link
Member Author

timj commented Jan 12, 2023

Rebasing to w.2021.46 was relatively easy (although I'm sure tests don't pass). The first major problem comes with #601 since that is when @andy-slac reorganizes how dimension record query results are returned such that order_by/limit can be supported and that provides a completely different return type with different parameters to the new, on this ticket, HomogeneousDimensionRecordIterable.

Not sure where to take this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants