Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: try out the sccache for caching rust code #10445

Merged
merged 1 commit into from
Jan 30, 2024
Merged

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Jan 16, 2024

No description provided.

@nagisa nagisa requested a review from a team as a code owner January 16, 2024 22:26
@nagisa nagisa force-pushed the nagisa/tests-sccache branch 2 times, most recently from e287d9f to e91041a Compare January 16, 2024 22:45
@nagisa nagisa marked this pull request as draft January 16, 2024 22:55
@nagisa nagisa removed the request for review from Longarithm January 16, 2024 22:55
@nagisa nagisa force-pushed the nagisa/tests-sccache branch 3 times, most recently from 23e7b18 to 6fbf8e4 Compare January 16, 2024 23:11
@nagisa
Copy link
Collaborator Author

nagisa commented Jan 16, 2024

cc @Ekleog

It looks like we’re hitting Github’s rate limits (HTTP 429 code) when interacting with the GH Cache this way :(

This significantly impacts the cache utilization (note: cache write errors)

Compile requests                   1262
Compile requests executed           968
Cache hits                          174
Cache hits (C/C++)                   22
Cache hits (Rust)                   152
Cache misses                        793
Cache misses (C/C++)                389
Cache misses (Rust)                 404
Cache timeouts                        0
Cache read errors                     0
Forced recaches                       0
Cache write errors                  792
Compilation failures                  0
Cache errors                          1
Cache errors (Rust)                   1
Non-cacheable compilations            0
Non-cacheable calls                 289
Non-compilation calls                 5
Unsupported compiler calls            0
Average cache write               0.451 s
Average compiler                  1.996 s
Average cache read hit            0.317 s
Failed distributed compilations       0

Non-cacheable reasons:
crate-type                          189
incremental                          70
unknown source language              17
-                                    12
-E                                    1

A big benefit to this approach on the other hand is the fact that the cache is shared between different parallel workflows, so if one job gets to and finishes compiling a particular crate first, all of the follow up jobs will be able to immediately re-use the artifact.

Relevant reading: mozilla/sccache#1485

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (abb2b28) 71.98% compared to head (381a5bf) 71.94%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10445      +/-   ##
==========================================
- Coverage   71.98%   71.94%   -0.05%     
==========================================
  Files         720      720              
  Lines      146559   146747     +188     
  Branches   146559   146747     +188     
==========================================
+ Hits       105507   105570      +63     
- Misses      36181    36307     +126     
+ Partials     4871     4870       -1     
Flag Coverage Δ
backward-compatibility 0.08% <ø> (-0.01%) ⬇️
db-migration 0.08% <ø> (-0.01%) ⬇️
genesis-check 1.25% <ø> (-0.01%) ⬇️
integration-tests 36.91% <ø> (-0.04%) ⬇️
linux 71.13% <ø> (-0.06%) ⬇️
linux-nightly 71.31% <ø> (-0.05%) ⬇️
macos 55.06% <ø> (-0.07%) ⬇️
pytests 1.47% <ø> (-0.01%) ⬇️
sanity-checks 1.27% <ø> (-0.01%) ⬇️
unittests 67.91% <ø> (-0.04%) ⬇️
upgradability 0.13% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 17, 2024

So I fiddled with this a bit more, hoping to improve my local setup and I'm finding that even locally sccache might not be a win over a fresh build even with a hot cache! Deets for the curious.

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 24, 2024

nevermind the previous message, sccache does in fact work but gains in my environment do leave a lot to be desired. Definitely should set it up for CI though, but that will require gcs bucket.

@Ekleog-NEAR
Copy link
Collaborator

It looks like we’re hitting Github’s rate limits (HTTP 429 code) when interacting with the GH Cache this way :(

Hmm… is this actually something that’d go away after running 3-4 builds, once the cache is hot? After everything is cached there should be much fewer cache writes (because reads would be hot), so hopefully we could just ignore that and not have to go the full-blown GCS cache way?

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 29, 2024

I believe both reads and writes are rate limited.

@Ekleog-NEAR
Copy link
Collaborator

Ekleog-NEAR commented Jan 29, 2024

https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#primary-rate-limit-for-github_token-in-github-actions seems to say that if github_token is passed, then we should have a limit of 15000 requests per hour, which should be more than enough. Maybe the issue came from not having it passed by sccache?

Edit: Actually seeing the paragraph just below the one I linked, the issue probably comes from the "Create too much content on GitHub in a short amount of time" secondary rate-limit. So I’d guess we’d probably still have a chance to be fine even with just a hot cache, because cache writes are more rate-limited than cache reads.

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 29, 2024

15k requests don't sound nearly enough to me (if we do indeed have that as the limit) -- a single fresh cargo build --workspace will already trigger ~2k compiler invocations on its own. Then there are like 3 invocations of this entire thing between nextest jobs alone, and we build a neard a few times too in some other integration tests.

By my estimation 15k cache lookups would be enough for, like, 2 builds even on a fully primed cache without any writes.

@Ekleog-NEAR
Copy link
Collaborator

Hmm WDYT about just testing it out, given how unclear the github documentation is? I’d suggest we land this, wait for a few days, look at what happened, and rollback if it didn’t actually work out — in this case we can then work on a proper GCS backend

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 30, 2024

I don’t have a good reason to oppose that.

@nagisa nagisa marked this pull request as ready for review January 30, 2024 12:56
@nagisa nagisa enabled auto-merge January 30, 2024 12:56
- run: pip3 install --user -r pytest/requirements.txt
- run: cargo llvm-cov show-env | grep -v RUSTFLAGS | tr -d "'" >> "$GITHUB_ENV"
- run: echo "RUSTC_WORKSPACE_WRAPPER=$PWD/scripts/rustc-coverage-wrapper.sh" >> "$GITHUB_ENV"
- run: echo "RUSTC_WORKSPACE_WRAPPER=$PWD/scripts/coverage-wrapper-rustc" >> "$GITHUB_ENV"
- run: echo "CARGO=1" >> "$GITHUB_ENV"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, what’s the CARGO=1 variable about?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mozilla/sccache#2040 is the issue to read.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nagisa nagisa added this pull request to the merge queue Jan 30, 2024
Merged via the queue into master with commit d3ea7ed Jan 30, 2024
26 checks passed
@nagisa nagisa deleted the nagisa/tests-sccache branch January 30, 2024 20:18
@nagisa
Copy link
Collaborator Author

nagisa commented Feb 1, 2024

Looks like we're consistently taking +5 minutes overall after this has landed. Unfortunately the cache failure statistics do not appear to have improved over time, though there are no "read failures" that I can see.

@Ekleog-NEAR
Copy link
Collaborator

Yup, that’s what it looks like to me too :’( Want to make the rollback PR or should I?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants