-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Total axes #118
Total axes #118
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #118 +/- ##
===========================================
+ Coverage 77.25% 88.72% +11.46%
===========================================
Files 8 8
Lines 554 541 -13
===========================================
+ Hits 428 480 +52
+ Misses 126 61 -65
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only actually blocking thing is the question about whether dropna()
is doing what we want it to do. But I would feel warm and fuzzy inside if you engaged with my various suggestions, too!
Broader questions I wanted to reiterate at the top level:
- do we need to filter by context dimensions in
get_facts
? it might make sense to just grab all the facts. then we can filtercontexts
by whether or not each context fits with theschema.primary_key
, and avoid passing extra state around. - should we consider not smearing the concept names into their own columns? I'm sure there was some sort of decision made about this in the ancient times, but what was the rationale there?
- should we have the output dataframes be indexed by
schema.primary_key
?
Finally, when I was testing the code I'm suggesting, I found a couple weird things that I needed to handle:
- empty files, obviously
- filings with duplicated ("c_id", "name") values in facts... that seemed surprising.
This PR changes the extractor behavior so missing
Axes
will be treated as Totals.