Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse rust_version as constraints from crate datasource #26315

Closed
rarkins opened this issue Dec 15, 2023 · 14 comments · Fixed by #27731
Closed

Parse rust_version as constraints from crate datasource #26315

rarkins opened this issue Dec 15, 2023 · 14 comments · Fixed by #27731
Assignees
Labels
datasource:crate Crate datasource (Rust) priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:feature Feature (new functionality)

Comments

@rarkins
Copy link
Collaborator

rarkins commented Dec 15, 2023

Describe the proposed change(s).

Parse rust_version per-version from the crate datasource and return it as constraints.rust.

Example package: https://crates.io/api/v1/crates/log

@rarkins rarkins added type:feature Feature (new functionality) priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others datasource:crate Crate datasource (Rust) labels Dec 15, 2023
@zharinov
Copy link
Collaborator

zharinov commented Mar 1, 2024

Currently, we're using URLs like https://crates.io/api/v1/crates/log?include=, in order to make response shorter.

However, seems like now we have to use https://crates.io/api/v1/crates/log?include=versions.

Once we do that, I wonder if we still need fetchCrateRecordsPayload() method for version values and isDeprecated fields. Whereas having versions data, we also will be able to return releaseTimestamp.

@rarkins
Copy link
Collaborator Author

rarkins commented Mar 1, 2024

I recall that there was a problem with the v1 API load for crates.io so therefore we want to hit the v1 API only if we know there's new versions

@zharinov
Copy link
Collaborator

zharinov commented Mar 1, 2024

Do you mean, if we do this constantly, the server load is too high?

@rarkins
Copy link
Collaborator Author

rarkins commented Mar 1, 2024

We should probably await advice from the crates team to make sure this versions filter won't cause load problems

@zharinov
Copy link
Collaborator

zharinov commented Mar 5, 2024

Actually, there is a comment in our code:

The ?include= suffix is required to avoid unnecessary database queries on the crates.io server. This lets us work around the regular request throttling of one request per second.

@zharinov
Copy link
Collaborator

zharinov commented Mar 5, 2024

However, it's not obvious why one request per second. I could curl more frequently in my local setup.

@rarkins
Copy link
Collaborator Author

rarkins commented Mar 5, 2024

Hi @epage @Turbo87, do either of you have information on what rate limits of crates.io we should abide by default? As mentioned above, today we use queries like https://crates.io/api/v1/crates/log?include= to get package metadata such as source URLs, which I think is possibly even optimized for purposes like Renovate so as not to strain the DB. Meanwhile if we are to fetch "metadata" about supported Rust versions per release, we then need to a query like https://crates.io/api/v1/crates/log?include=versions which may be "heavier". We can reduce this load by:

  • Only querying this URL if it's a new package, or we discover there's a new version of an existing package, and
  • Rate limiting requests per second to crates.io

The above will work quite well with our shared Redis cache in production, however we can't ensure that self-hosted users have a persistent cache so they may query such URLs once per dependency per run. In that case crates.io would be protectd by our default rate limiting per second.

@epage
Copy link

epage commented Mar 5, 2024

You should probably be using the Sparse Index. It is what cargo downloads and uses for resolving dependencies to then know what .crate file to download and is designed for heavy traffic.

@Turbo87
Copy link
Contributor

Turbo87 commented Mar 5, 2024

yep, the sparse index would be the best option. you can see e.g. on https://index.crates.io/se/rd/serde that it includes rust_version fields for all versions that have the field specified in their Cargo.toml files.

https://crates.io/api/v1/crates/log?include=versions (currently) includes all versions without pagination from the live database. that makes the responses somewhat slow for crates with thousands of versions (e.g. swc). the sparse index however is served from S3 and cached on the CDN level, so, even though it is not paginated either, it doesn't hit the database and is quite a bit faster and more reliable.

@zharinov
Copy link
Collaborator

zharinov commented Mar 5, 2024

Regarding this issue, we could achieve its objective with sparse index, which is great news.

However, for most of the package managers, we're also trying to fetch timestamps of each individual release, in order to implement policies like "wait 7 days after new release, before creating PR for it". If I understand correctly, the only place we could get these timestamps is the versions field (that's why it was so appealing to me at the start).

Probably we could use Last-Modified header for that purpose, because it seems to quite closely match the timestamp of the latest release. I'm not 100% sure, but maybe the timestamp of just latest release could be enough for Renovate to do its job in that regard.

@epage
Copy link

epage commented Mar 5, 2024

I'd love to have a timestamp in the Index so we can do time traveling resolving of dependencies in Cargo (rust-lang/cargo#5221). The main thing holding that back is the question of what to do about yanked packages because that is a mutable field that can change multiple times so you can't actually resolve to a specific time if yanked packages are involved.

In the mean time, could both approaches work?

  • Use https://crates.io/api/v1/crates/log?include= for timestamps, if needed
  • Use https://index.crates.io/se/rd/serde for rust-version

This is assuming https://crates.io/api/v1/crates/log?include= is lighter on the database than https://crates.io/api/v1/crates/log?include=versions

@rarkins
Copy link
Collaborator Author

rarkins commented Mar 5, 2024

Having the timestamp per version/release is really good to have, and it seems like the versions endpoint is the only one which has that now?

@zharinov
Copy link
Collaborator

zharinov commented Mar 5, 2024

By the way, I think we could reach ...?include=versions, but only if (1) we see new versions and (2) we're sure Renovate instance has caching enabled.

@renovate-release
Copy link
Collaborator

🎉 This issue has been resolved in version 37.229.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
datasource:crate Crate datasource (Rust) priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:feature Feature (new functionality)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants