DM-30439: Refactor DataCoordinate #769

timj · 2023-01-12T16:11:15Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

timj · 2023-01-12T16:11:52Z

I know this needs work to deal with conflicts, created as a draft PR to make it easier to examine the changes.

TallJimbo · 2023-01-12T16:27:23Z

👍 Note that I definitely want to revisit some of the code here before merging it - my thinking on the custom containers has evolved quite a bit since I wrote those here. RFC-834 is in play for the DataCoordinate containers, and I think storage class conversions sort of blow up the whole concept of name-keyed DatasetRef containers, since we cannot assume that dataset type name equality means dataset type equality in many contexts anymore.

In all of the PEP-544 examples, Protocols appear last, and the original order confuses pylance.

This starts a new _containers subpackage that will be extended to include other butler primitives, like DimensionRecords and DatasetRefs.

This renames some existing classes, so it is a breaking API change, but one whose effects should be minimal to nonexistent outside daf_butler. Overall it aligns the hierarchy better with the built-in typing hierarchy, injecting some interfaces that had implementations but no ABCs, and it cleans up the handling of mutability by separating frozenset from set views.

We've historically treated DimensionRecord lookup failures as a non-error (we just don't return records we didn't find), at least mostly because we allow implied dimension dependencies to be None (e.g. an exposure need not have a physical_filter). But that's actually broken because of DM-21840, and I now think the right solution there is to just make those foreign keys NOT NULL after all. That's a long way of saying that I think we could _probably_ start treating DimensionRecord lookup failures as errors, and simplify a lot of code in the process (though a lot of test would break). I'm not confident enough to try that now and I don't want to deal with fixing the tests, but I am confident enough to say that lookup failures are so rare that we *can* simplify by not caching them, and that's what this commit does. This is relevant because I'm about to reimplement the caching entirely in terms of the new DimensionRecord containers, and I don't want those to have to worry about caching failed lookups. But I wanted to make the (slight) behavioral change a separate commit (this one) from the refactoring one.

This is now being used as a callback in the new cache object, and for consistent error-handling there we need these to behave more consistently.

This was a test for some cleverness that avoids a copy, but I'm not sure all of that cleverness is really worth its weight, and I don't think we should be testing something the user is not supposed to care about.

This adds a new method to Registry and SqlRegistry without a RemoteRegistry implementation, and that's why it isn't decorated with abstractmethod yet. I'll fix that after I make the DimensionRecord containers serializable. This also adds a check to DataCoordinate.standardize that any passed keys are actually dimension names; this addresses a long-standing problem where users would get a keyword argument to some higher-level API wrong, and it would be forwarded as **kwargs down to DataCoordinate.standardize and then silently ignored. And it turns out we were doing that even in our own test utility code!

This reimplements some of the special handling of non-standard keys from Butler._findDatasetRef in the hopes of being able to move it all down to Registry (and thus work on many more interfaces). But it's just a start at that; I realized while trying to make Butler._findDatasetRef use the new code that we really need to make queryDatasets work on CALIBRATION collections first. But I think what I've done so far will still be useful eventually, so I'm keeping it.

timj · 2023-01-12T17:37:04Z

Rebasing to w.2021.46 was relatively easy (although I'm sure tests don't pass). The first major problem comes with #601 since that is when @andy-slac reorganizes how dimension record query results are returned such that order_by/limit can be supported and that provides a completely different return type with different parameters to the new, on this ticket, HomogeneousDimensionRecordIterable.

Not sure where to take this now.

timj force-pushed the tickets/DM-30439 branch 2 times, most recently from acdf262 to fe6fe36 Compare January 12, 2023 16:26

timj force-pushed the tickets/DM-30439 branch 3 times, most recently from ac6121c to ac82d70 Compare January 12, 2023 16:58

TallJimbo added 15 commits January 12, 2023 09:59

Export type alias for pydantic-friendly DatasetId.

a2496b4

Fix Protocol inheritance ordering.

aa3b4c7

In all of the PEP-544 examples, Protocols appear last, and the original order confuses pylance.

Move DataCoordinate container classes to new modules.

5e08d90

This starts a new _containers subpackage that will be extended to include other butler primitives, like DimensionRecords and DatasetRefs.

Add custom containers for DimensionRecords.

bfcbd04

Add DimensionRecord cache containers.

751731a

Fix edge-case bugs in governor dimension fetching.

eb8ab29

Use DimensionRecord containers for database fetches and queries.

f125bac

Require more consistency in DimensionRecordStorage.fetch.

60ffbe7

This is now being used as a callback in the new cache object, and for consistent error-handling there we need these to behave more consistently.

Doc fixes for Registry.expandDataIds.

4e9748f

Stop testing behavior that DataCoordinate doesn't advertise.

decbdc9

This was a test for some cleverness that avoids a copy, but I'm not sure all of that cleverness is really worth its weight, and I don't think we should be testing something the user is not supposed to care about.

Add records argument to DataCoordinateIterable.standardize.

66c40c0

timj force-pushed the tickets/DM-30439 branch from ac82d70 to 4f55339 Compare January 12, 2023 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-30439: Refactor DataCoordinate #769

DM-30439: Refactor DataCoordinate #769

timj commented Jan 12, 2023

timj commented Jan 12, 2023

TallJimbo commented Jan 12, 2023

timj commented Jan 12, 2023

DM-30439: Refactor DataCoordinate #769

Are you sure you want to change the base?

DM-30439: Refactor DataCoordinate #769

Conversation

timj commented Jan 12, 2023

Checklist

timj commented Jan 12, 2023

TallJimbo commented Jan 12, 2023

timj commented Jan 12, 2023