Cache dimension snakecase #181

rousik · 2023-12-29T17:35:20Z

It turns out that this piece of code gets run very frequently and accounts for quite a lot of runtime for xbrl extraction. In the test scenario, this improvements seems to produce ~15% reduction in extraction time.

It turns out that this piece of code gets run very frequently and accounts for quite a lot of runtime for xbrl extraction. In the test scenario, this improvements seems to produce ~15% reduction in runtime.

For more information, see https://pre-commit.ci

codecov · 2023-12-29T17:42:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7c2e2f6) 93.43% compared to head (e724aa2) 93.44%.
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
+ Coverage   93.43%   93.44%   +0.01%     
==========================================
  Files           8        8              
  Lines         609      610       +1     
==========================================
+ Hits          569      570       +1     
  Misses         40       40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zaneselvans

Cool! I bet there are other speedups lurking in here too. Thanks for finding this.

rousik · 2023-12-29T17:47:35Z

Cool! I bet there are other speedups lurking in here too. Thanks for finding this.

Yep, I would like to test this in the full PUDL ETL before committing this PR, so I'm trying to figure out the right (and low effort) way to go about that as I'm assuming that this may not be the last such change.

I should probably also do a write-up/wiki doc for how to do profiling of the code to find stuff like this. It turns out that the multi-processing is kind of annoying and use of process executors here definitely got in the way of getting clean profiles.

zaneselvans · 2023-12-29T18:38:12Z

For testing the altered ferc-xbrl-extractor is there a reason not to just check it out alongside the main pudl repo and then install it within your pudl-dev environment?

pip install --no-deps --no-cache-dir --editable ../ferc-xbrl-extractor

That should replace the version of ferc-xbrl-extractor that you've got in the pudl-dev environment locally with whatever version of ferc-xbrl-extractor you have checked out. Then you can edit either repository as needed and re-run the tests or the full ETL with dagster and they'll reflect any changes in both pudl and ferc-xbrl-extractor.

Once you're satisfied with the changes to ferc-xbrl-extractor you can commit them and we can cut a new release by pushing a version tag, and you can update the required version in the pyproject.toml for PUDL.

rousik · 2023-12-29T19:19:04Z

Thanks, that definitely sounds like the easiest way to go about testing this.

rousik · 2024-01-18T05:14:51Z

I have tested this in the context of ferc_to_sqlite, sadly the benefits are minimal at that point. I suppose the multi-processing must be making the caches less effective or the bottlenecks are different in that scenario. I have avoided multi-processing for cpu profiling as it was more difficult to obtain the profile across process boundaries. I will try to look into this some more. We could submit this, but the benefits are likely not going to be as good as initially promised :-/

zaneselvans

Seems like we might as well merge it in. It doesn't really add complexity and it'll improve performance in some computational contexts even if it doesn't right now, so why not?

rousik · 2024-01-18T17:48:47Z

Seems like we might as well merge it in. It doesn't really add complexity and it'll improve performance in some computational contexts even if it doesn't right now, so why not?

I would like to re-run the analysis and that would be harder once this is merged in. In general, I'm a bit wary of optimizations that actually don't help :-/

zaneselvans · 2024-01-18T20:20:28Z

Haha, okay well I'm fine with just closing it PR too. Splitting off the FERC extractions in our CI reorganizations will be a much much larger performance boost.

rousik · 2024-01-19T15:55:50Z

Let's merge. It may not help but it won't hurt and realistically, I don't have enough cycles to dig into this before I free up my plate of other things. You're right that more significant speedups will likely be brought by the sharding of ferc_to_sqlite

rousik and others added 2 commits December 26, 2023 11:37

Cache dimension snakecase.

568a028

It turns out that this piece of code gets run very frequently and accounts for quite a lot of runtime for xbrl extraction. In the test scenario, this improvements seems to produce ~15% reduction in runtime.

[pre-commit.ci] auto fixes from pre-commit.com hooks

e724aa2

For more information, see https://pre-commit.ci

zaneselvans approved these changes Dec 29, 2023

View reviewed changes

zaneselvans approved these changes Jan 18, 2024

View reviewed changes

zaneselvans added the performance Resource consumption like memory or CPU intensity label Jan 19, 2024

zaneselvans assigned rousik Jan 19, 2024

zaneselvans merged commit 226bc86 into main Jan 19, 2024
16 checks passed

zaneselvans deleted the cache-snakecase branch January 19, 2024 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache dimension snakecase #181

Cache dimension snakecase #181

rousik commented Dec 29, 2023

codecov bot commented Dec 29, 2023 •

edited

Loading

zaneselvans left a comment

rousik commented Dec 29, 2023

zaneselvans commented Dec 29, 2023 •

edited

Loading

rousik commented Dec 29, 2023

rousik commented Jan 18, 2024

zaneselvans left a comment

rousik commented Jan 18, 2024

zaneselvans commented Jan 18, 2024

rousik commented Jan 19, 2024

Cache dimension snakecase #181

Cache dimension snakecase #181

Conversation

rousik commented Dec 29, 2023

codecov bot commented Dec 29, 2023 • edited Loading

Codecov Report

zaneselvans left a comment

Choose a reason for hiding this comment

rousik commented Dec 29, 2023

zaneselvans commented Dec 29, 2023 • edited Loading

rousik commented Dec 29, 2023

rousik commented Jan 18, 2024

zaneselvans left a comment

Choose a reason for hiding this comment

rousik commented Jan 18, 2024

zaneselvans commented Jan 18, 2024

rousik commented Jan 19, 2024

codecov bot commented Dec 29, 2023 •

edited

Loading

zaneselvans commented Dec 29, 2023 •

edited

Loading