RFC: make Cargo embed dependency versions in the compiled binary #2801

Shnatsel · 2019-11-03T17:23:09Z

This is the first step towards a proper security update story for Rust. You might also want to check out the comments on a prototype implementation that informed this RFC.

2023 update: this has been implemented out-of-tree as cargo auditable, which has garnered significant adoption. This RFC has been updated and expanded based on the lessons learned from it.

text/0000-cargo-embed-dependency-versions.md

igor-petruk · 2019-11-03T17:45:26Z

One idea is to wrap Cargo.lock into another trivial format: yaml, json, custom text with two keys: ContentType and Body and you can version your format if Cargo.lock becomes cumbersome (for some yet unknown reasons) and will have to be replaced with something else.

Alex-Addy · 2019-11-03T19:48:26Z

I would request that this is not enabled by default. As noted by RustDragon on reddit, this could leak potentially sensitive information through the embedded paths.

Shnatsel · 2019-11-03T19:58:10Z

As described in the RFC, not having this enabled by default pretty much defeats the point.

Would disclosing that the dependency originated from a local copy, but not disclosing the directory path be an acceptable solution? This would protect information such as usernames and paths.

text/0000-cargo-embed-dependency-versions.md

Alex-Addy · 2019-11-03T20:07:42Z

I'm not sure that not having it enabled by default would defeat the point. Having this information available to be enabled would still be immensely valuable to production users. My main concern is users accidentally having their information leaked just by handing out a binary.

Alex-Addy · 2019-11-03T21:12:17Z

Would disclosing that the dependency originated from a local copy, but not disclosing the directory path be an acceptable solution? This would protect information such as usernames and paths.

Having thought about this a bit more I think that this would be fine. It might be desirable to be able to turn path embedding back on (for full auditing), with it being disabled by default (for safety).

Lokathor · 2019-11-03T21:14:46Z

Yeah this should definitely be off by default, even if the info is scrubbed of paths and stuff.

Shnatsel · 2019-11-03T21:16:56Z

@Lokathor could you provide a rationale? I think the RFC makes a pretty strong case for the on-by-default approach. Prior art from Go and (as I've just learned) Ruby supports this as well.

Shnatsel · 2019-11-03T21:22:10Z

It has also been pointed out that this may interact with reproducible builds in interesting ways - I'm not sure if Cargo.lock is guaranteed to have a stable sorting order.

Diggsey · 2019-11-03T21:30:46Z

I don't necessarily think this should be off by default if the size increase is indeed as minimal as claimed: even in release mode some debug information is included already. I do think it should be removed when the binary is stripped though.

The RFC claims a sub 1% increase in size: I would like to see some more evidence to back that up; the "hello world" example is not enough. The number of dependencies increases the lock file size linearly, but it does not increase the size of the binary linearly due to unused code being removed at link time, so the relative size could go up. Furthermore, platform-specific dependencies are tracked in the lock file but are not compiled into the binary at all on other platforms.

Regarding reproducible builds: as long as the information in the Cargo.lock is encoded in a deterministic way, it shouldn't be a problem. If cargo picks a dependency version non-deterministically then your deterministic build is broken regardless of whether you incorporate the lock file directly.

One thing not covered by this RFC is what crate types will embed this information. Should the "cdylib" type embed the information? On Windows, DLLs are just executables with a different extension so you could reasonably argue that the same version information should be included.

Shnatsel · 2019-11-03T21:39:27Z

The RFC claims a sub 1% increase in size: I would like to see some more evidence to back that up; the "hello world" example is not enough.

Good point, will do.

Regarding reproducible builds: as long as the information in the Cargo.lock is encoded in a deterministic way, it shouldn't be a problem.

I don't think Cargo.lock has any guarantees regarding stable sorting order for dependencies, for example.

One thing not covered by this RFC is what crate types will embed this information.

It's under "unresolved questions" since I don't have a strong stance on this, but I am leaning towards "yes" for the same reasons you've described.

Lokathor · 2019-11-03T22:08:52Z

Bonus data point: A minimal hello world on Windows is ~3.5k, and the Cargo.lock for that is ~1.9k.

The ratio generally gets better as your program grows. The amount of code that usage of a dependency puts into your binary is usually a lot more than the line or two it takes to list the dependency.

Dependencies that hurt the ratio (adding a dependency entries but not adding code into the final binary) would likely be things like proc-macro crates, macro_rules!-only crates (eg: cfg-if), and bindings crates (eg: winapi).

That said, I still don't want this in my programs because I just don't want extra stuff going in my files.

Shnatsel · 2019-11-03T22:09:59Z

Size comparisons on binaries of varying sizes, as measured on Linux with ThinLTO:

exa is 2.6Mb stripped, 1.5Mb unstripped, 25Kb Cargo.lock. <1% vs unstripped, <2% vs stripped
cargo-audit is 6.5Mb unstripped, 3.6Mb stripped, Cargo.lock is 41Kb. <1% vs unstripped, <2% vs stripped
cargo-geiger is 17Mb unstripped, 12Mb stripped, Cargo.lock is 95Kb. <1% even vs stripped
ripgrep is 32Mb unstripped, 4.5Mb stripped. Its Cargo.lock is 24Kb, <1% even vs stripped

We can do a Crater run to be sure.

gmorenz · 2019-11-03T22:35:13Z

reproducible builds

It seems like we should aim to be reproducible even if the crates came from different sources. I.e. downloading something from a registry vs using a local path shouldn't change things.

Maybe we could use a format this is just a sorted sequence of (crate-name, version, hash) instead of directly embedding Cargo.toml? This would also resolve almost all of the information leak concerns.

Shnatsel · 2019-11-03T22:53:42Z

I fear omitting the registry where a certain dependency came from would mean losing too much information. This defeats use cases such as only allowing dependencies from a private registry, for example.

tarcieri · 2019-11-03T23:00:23Z

reproducible builds

You can't reproduce a build without Cargo.lock, so if anything, including it in the binary is a helpful forensic artifact for reproducing its build.

Ixrec · 2019-11-03T23:12:52Z

Since build system and package management system are usually decoupled, most other languages did not have the opportunity to implement anything like this.

For prior art and motivation completeness: Where I work at Bloomberg, our C++ codebase has enough control over such things that we rolled our own version embedding tricks many years ago. On top of the security benefits, this has been utterly invaluable for diagnosing many bugs in production, and ensuring that fixes have been fully released. The specific format it uses doesn't make much sense for an open source ecosystem in a language with a proper module/package system, but I thought it was worth mentioning this sort of thing has precedent even in C++.

rotty · 2019-11-03T23:33:04Z

As suggested by @Shnatsel, I'll drop a mention of my cargo-audit-tags Python script. It allows a similar check based on the source code, checking the historical Cargo.lock files in git, based on git tags. This allows checking released artifacts based on those tags for known vulnerabilities in dependencies. At work, we have this running as part of a daily CI job, which will send an email if vulnerabilities are detected.

Of course, this does not cover the use case of the RFC, but it is a related tool that has some overlap with the goals of the RFC.

joshtriplett · 2019-11-04T00:18:03Z

First of all: thank you for working on the problems of dependency tracking and security analysis! Please don't take any of the feedback below as suggesting that I disagree with the goals.

I agree with the various other posts giving a rationale of not wanting to disclose excessive information, especially about local paths and registries. You need to scrub all of the potentially sensitive information, such as local registries, paths, and repository URLs.

A quick check on my system turns up Cargo.lock files upwards of 100k-140k.

I'm also concerned about exposing the potentially evolving Cargo.lock format (as mentioned in your unresolved questions section); that format has changed substantially over time.

I think we could reduce that format substantially to serve this particular purpose. But at the very least, it seems worth compressing it before embedding it. A quick check suggests that basic zlib compression reduces the size of Cargo.lock files by 75-85%.

You also need to consider the size of the delta from one version of a binary to another, as many distribution mechanisms want to distribute deltas. I can easily imagine the Cargo.lock delta taking up more space than the delta to the code itself. (Note that deltas interact with the issue of compression above; tools to compute deltas need to know about use of compression.)

You should discuss the issue of format versioning. You could theoretically evolve this format by using a new section name if the version changes.

Given that this uses a completely Rust-specific (and cargo-specific) format, the section name should not use something generic like .dep-list.

A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled).

That's a problem that people are trying to fix. And that's with std. It's possible to build binaries less than 1kB today.

WASM, asm.js and embedded platforms are excempt from this mechanism since they have very strict code size requirements.

(Typo: s/excempt/exempt/)

People care about code size on other platforms, too, and the same target may get used for large applications and embedded development. You can't reliably classify every target into "embedded" and "not embedded".

Also, this isn't just about code size; some use cases need to carefully construct binaries and care about any "unusual" sections that get generated in the binary. Some binaries will get built with linker scripts.

You don't specify any mechanism to enable/disable this. You need to provide and specify appropriate configuration for that.

For a first pass, I would propose addressing the issues above, and then proposing a version that's initially disabled by default. This will let people experiment with it, optimize it, audit the information it includes, and so on. We can always revisit questions of defaults and their interaction with various use cases in the future.

vincentdephily · 2019-11-04T13:26:31Z

Regarding the "leaking sensitive information" issue, beyond the obvious "strip local paths and private urls" mitigation, it's useful to think in terms of threat model. How does embedding version info affect the balance between attacker and defender ?

The version info is only beneficial if the system is dilligently updated.
A a system that can't be or isn't regularly updated could instead argue for security by obscurity.
An attacker typically has more practical ways to test for a vuln than reading the binary.
When dealing with zero-day vulns, the version info only benefits the attacker.

For most cases I think the version info is a net gain for the defender (and therefore it should eventually be on by default once it matures), but there'll always be cases when it isn't.

It's not just about security: as a user I may want to recompile my binaries when a performance fix is out for example.

tarcieri · 2019-11-04T15:48:53Z

When I worked on an infosec team at a large payments company, we built a system based on Ruby's Gemfile.lock, which I think is quite similar to the goals of this proposal (and also as mentioned in the proposal).

It scanned filesystems for these files, audited them against a similar vulnerability database (RubySec), opened tickets for vulnerable apps, and auto-closed them when they were updated to non-vulnerable versions.

It seems there's a lot of concerns about potential leakage of internal repositories. However for the above feature to work, we needed that information (which we also used to file internal vulnerabilities against internal packages). Far from that information being a concern, it was something we actually leveraged in an enterprise context.

For enterprises deploying in-house applications, I don't think leaking this information is any concern whatsoever.

For crates published on crates.io, it also isn't a concern as these crates, by definition, can't refer to internal repositories.

So the real information leakage concern is: publicly published binaries (e.g. through a vendor's web site or an app store) leaking this information accidentally because the publishers did not opt out of removing it. To me this is a somewhat valid concern: it's information exposure and a potential loss in defense-in-depth.

But I do just want to make clear that I think, with that acknowledged, this is also a feature that enterprise users will want for internal binaries, to allow internal auditing of deployed application binaries.

epage · 2023-10-04T16:00:47Z

We talked about this in a recent cargo team meeting and there seemed to be interest in first focusing on identifying the needs and seeing what part cargo should be playing in solving the larger SBOM situation. If there is a built-in solution, I assume the extent of the information will mean it would be large and not reproducible, so we'd like want to start with the separate artifact. From there, we can further explore the aims of embedding a subset of the SBOM in the binary.

@Shnatsel would you be interested in pivoting this RFC in that direction and leading or taking part in that effort?

Shnatsel · 2023-10-04T17:01:54Z

I just got contracted to advance cargo-cyclonedx. So I am getting quite involved in the broader issue of SBOM generation, and I've done a bit of research on what is required for it to work well.

There are some gaps in the information that cargo metadata exposes, most notably the dependency hashes not supporting the feature resolver v2; but I do not see any fundamental reason for SBOM generation not to be built on top of stable APIs of Cargo, such as cargo metadata, once those issues are addressed. The fact that we are dealing with generating a separate SBOM file as opposed to injecting something into the build process also makes things easier. So I don't think it makes sense to build generic SBOM handling into Cargo when external tools can do it equally well (and potentially move faster).

I am planning to migrate cargo cyclonedx to cargo metadata as the backend as part of this contract work, so in a month or two we are going to see if cargo metadata is actually suitable for SBOM use cases, or is there something I'm overlooking.

epage · 2023-10-04T17:16:37Z

I am planning to migrate cargo cyclonedx to cargo metadata as the backend as part of this contract work, so in a month or two we are going to see if cargo metadata is actually suitable for SBOM use cases, or is there something I'm overlooking.

For some of the people I talked to about SBOMs, cargo metadata sounded insufficient:

For both cases, we'd need all build inputs (RUSTFLAGS, enabled features, etc) as well as decisions made by -sys build scripts, etc

Shnatsel · 2023-10-04T17:37:35Z

For both cases, we'd need all build inputs (RUSTFLAGS, enabled features, etc) as well as decisions made by -sys build scripts, etc

All of this can be captured with the machinery already used by cargo auditable. It relies solely on the stable interfaces to Cargo and rustc. Enabled features are already collected, and RUSTFLAGS aren't but trivially could be, and "decisions made by -sys build scripts" are a whole separate problem.

Generic SBOM generation is a much bigger scope with a much bigger design space, so I am a bit puzzled how that would make a reasonable starting point. The scope of this RFC is intentionally very narrow, and the format is also designed so that we can easily extend it later if other compelling use cases emerge. This is done specifically to make the change as straightforward and uncontroversial as possible.

tarcieri · 2023-10-05T12:18:03Z

@epage reading previous comments, it sounds like your concern regarding missing SBOM data is about false positive elimination, however this is a known and longstanding issue with e.g. cargo audit in general rustsec/rustsec#21

It's something we've long been meaning to address, but also if we held off on shipping anything until it was solved we wouldn't have anything to show for it.

Even if it were implemented immediately, it can't yet be consumed by cargo audit, so is there really a good reason to make it a blocker for the implementation of this feature?

At any rate, it's no worse than the current state of affairs with Cargo.lock auditing using cargo audit.

epage · 2023-10-05T13:21:30Z

Maybe put a different way, this solves one problem (identifying vulnerabilities in released binaries that lack a chain of custody) but, for the limited bandwidth of the cargo team, we are interested in putting our efforts in solving a different problem (SBOMs). They are related (cargo-audit has a subset of the needed dependency information). Independent of our general focus, there is the question of designing the manifest configuration for this and I suspect looking at SBOMs first has us looking at the bigger picture and will help us have more of the information we need for the manifest configuration.

tarcieri · 2023-10-05T13:29:52Z

But if this feature used an SBOM format like CycloneDX, it could be an incremental first step towards full SBOM support that itself is independently useful for vulnerability auditing.

Someone else just needs to, as an additive second step, fill in the missing gaps you've highlighted.

Shnatsel · 2023-10-05T13:41:43Z

This RFC is deliberately scoped so narrowly that we wouldn't need to design manifest configuration beyond an on/off switch.

I still believe that SBOM generation is better solved outside Cargo. The SBOM formats are still evolving, as is the ecosystem around them. New use cases, requirements and regulations pop up all the time. This rapidly evolving landscape would conflict with Cargo's stability guarantees, and the limited reviewer bandwidth for a rapidly changing implementation would be a hindrance.

I expect to produce proof of this belief in the next couple of months in the form of cargo cyclonedx.

Shnatsel · 2023-10-05T14:11:21Z

It is clear that the RFC needs major revisions. At least:

The details of how this is configured should be specified
The Cargo team has expressed a desire in having this off by default, at least initially, which should be reflected in the text
I want to experiment with using an industry standard format, CycloneDX, and see if compression will compensate for its verbosity
Interactions with other scanning approaches and ways to reduce false positives, such as parsing debug info, should be documented

So I am converting this PR to draft to clearly indicate its status.

epage · 2024-05-21T15:22:30Z

@rfcbot postpone

I propose we finish up and stabilize #3553 before evaluating next steps for tracking of artifacts.

rfcbot · 2024-05-21T15:22:32Z

Team member @epage has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot · 2024-06-11T13:56:13Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2024-06-21T13:57:12Z

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This is now postponed.

Shnatsel · 2024-08-15T19:25:05Z

Note to future self: configuration not being sufficiently defined was brought up as a blocker for this RFC.

rust-lang/cargo#14222 has documented the configuration considerations for Cargo. This document should be used as a starting point later.

Shnatsel added 4 commits September 8, 2019 11:27

Create 0000-cargo-embed-dependency-versions.md

f3cb798

Write more of the RFC

6c32f8b

Finishing touches on the RFC before opening a PR

dcd7cd7

Fill in date and PR #

f77698a

alex reviewed Nov 3, 2019

View reviewed changes

text/0000-cargo-embed-dependency-versions.md Outdated Show resolved Hide resolved

alex reviewed Nov 3, 2019

View reviewed changes

text/0000-cargo-embed-dependency-versions.md Outdated Show resolved Hide resolved

Add prior art from Go. Thanks @alex!

53d4fbd

joshlf reviewed Nov 3, 2019

View reviewed changes

text/0000-cargo-embed-dependency-versions.md Outdated Show resolved Hide resolved

Include prior art from Ruby

e63c925

Link to more info on binary sizes

5aed819

This comment was marked as resolved.

Sign in to view

Shnatsel marked this pull request as draft October 5, 2023 14:11

epage mentioned this pull request Jan 12, 2024

RFC: cargo-sbom #3553

Open

rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels May 21, 2024

rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Jun 11, 2024

rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jun 21, 2024

rfcbot added to-announce postponed RFCs that have been postponed and may be revisited at a later time. and removed disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Jun 21, 2024

rfcbot closed this Jun 21, 2024

Shnatsel mentioned this pull request Jun 21, 2024

Can't build recent gitoxide versions rust-secure-code/cargo-auditable#124

Open

RFC: make Cargo embed dependency versions in the compiled binary #2801

RFC: make Cargo embed dependency versions in the compiled binary #2801

Conversation

Shnatsel commented Nov 3, 2019 • edited Loading

igor-petruk commented Nov 3, 2019

Alex-Addy commented Nov 3, 2019

Shnatsel commented Nov 3, 2019

Alex-Addy commented Nov 3, 2019

Alex-Addy commented Nov 3, 2019

Lokathor commented Nov 3, 2019

Shnatsel commented Nov 3, 2019 • edited Loading

Shnatsel commented Nov 3, 2019

Diggsey commented Nov 3, 2019

Shnatsel commented Nov 3, 2019

Lokathor commented Nov 3, 2019

Shnatsel commented Nov 3, 2019 • edited Loading

gmorenz commented Nov 3, 2019 • edited Loading

Shnatsel commented Nov 3, 2019

tarcieri commented Nov 3, 2019

Ixrec commented Nov 3, 2019 • edited Loading

rotty commented Nov 3, 2019

joshtriplett commented Nov 4, 2019

vincentdephily commented Nov 4, 2019

tarcieri commented Nov 4, 2019 • edited Loading

epage commented Oct 4, 2023

Shnatsel commented Oct 4, 2023

This comment was marked as resolved.

epage commented Oct 4, 2023

This comment was marked as resolved.

This comment was marked as resolved.

Shnatsel commented Oct 4, 2023

tarcieri commented Oct 5, 2023

epage commented Oct 5, 2023

tarcieri commented Oct 5, 2023

Shnatsel commented Oct 5, 2023

Shnatsel commented Oct 5, 2023 • edited Loading

epage commented May 21, 2024

rfcbot commented May 21, 2024 • edited by arlosi Loading

rfcbot commented Jun 11, 2024

rfcbot commented Jun 21, 2024

Shnatsel commented Aug 15, 2024

Shnatsel commented Nov 3, 2019 •

edited

Loading

Shnatsel commented Nov 3, 2019 •

edited

Loading

Shnatsel commented Nov 3, 2019 •

edited

Loading

gmorenz commented Nov 3, 2019 •

edited

Loading

Ixrec commented Nov 3, 2019 •

edited

Loading

tarcieri commented Nov 4, 2019 •

edited

Loading

Shnatsel commented Oct 5, 2023 •

edited

Loading

rfcbot commented May 21, 2024 •

edited by arlosi

Loading