Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check --config for dotted keys only #10176

Merged
merged 9 commits into from
Jan 25, 2022
Merged

Conversation

jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Dec 6, 2021

This addresses the remaining unresolved issue for config-cli (#7722).

@rust-highfive
Copy link

r? @Eh2406

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 6, 2021
@jonhoo jonhoo mentioned this pull request Dec 6, 2021
2 tasks
This addresses the remaining unresolved issue for `config-cli` (rust-lang#7722).
@Eh2406
Copy link
Contributor

Eh2406 commented Dec 14, 2021

The code looks good to me, but @ehuss knows the desired behavior much better than I do.

@bors r? @ehuss

@rust-highfive rust-highfive assigned ehuss and unassigned Eh2406 Dec 14, 2021
@alexcrichton
Copy link
Member

Personally I don't think that this is a great solution for the problem which is to write an inline toml parser for the keys here. I think it would be better to expose more APIs from the toml crate itself (or something like that) to avoid duplicating all the code here.

@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

@alexcrichton I personally disagree here — it seems odd to me for toml to specifically expose parsers for sub-parts of the spec. Feels like it would expose too much of the implementation details of the library unnecessarily. Even making Deserializer::key_value or just Deserializer::dotted_key leaks details like Line and Span, neither of which currently need to be public. It's true that we're re-implementing part of the parser here, but it's a very small part, and it's simplified by the fact that we only need to do validation and not actual deserialization into a value. Ultimately it's up to you all though of course. I'd be happy to start pushing on a contribution to toml instead to expose the necessary APIs if you have an idea of what that might look like, and then follow up on using that here.

@ehuss
Copy link
Contributor

ehuss commented Dec 14, 2021

I also would prefer to not have a partial TOML parser here.

Unfortunately at the moment the TOML story is in flux. My original intent was to expose a raw interface in the toml crate to make this easy. That interface was intended to then be the basis of editing support (the low-level interface looked like this). However, we're now looking to instead use toml_edit entirely. Perhaps you could discuss with @epage the feasibility of exposing something that could provide lower-level access to the tokenizer? I'm not sure what that would look like or if they would be willing to provide something like that, but I think it would be good to discuss first.

@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

Would love to hear your thoughts @epage!

A secondary observation here is that we could always swap out the custom validation logic here for something provided by toml or toml_edit down the line, assuming the current validation is pessimistic in what it accepts (which I believe it is). The motivation being that it would unblock --config in the meantime if we expect the necessary changes to toml/toml_edit to be on a relatively long time horizon.

epage added a commit to epage/toml_edit that referenced this pull request Dec 14, 2021
I remember wanting to remove this before and had problems.  It looks
like those have since been addressed (I think with changes to repr).

Before, we were taking any string and trying to infer what its repr
would be.

Now we only parse actualy dotted keys.

This will hopefully help with rust-lang/cargo#10176
@epage
Copy link
Contributor

epage commented Dec 14, 2021

Let me see if I understand the problem being solved.

We are adding a --config flag that works as either

  • --config <path>
  • --config KEY=VALUE

In doing so, we are trying to parse KEY as toml key and VALUE as toml value

In using toml, we want to restrict the syntax so someone can't inject more complex toml expressions (tables, etc).

Is that the gist?

We have a impl FromStr for Key which might do the trick. It had some unnecessary additional logic. toml-rs/toml#273 is removing that so it more directly handles keys as toml would expect (and people can build their own logic on top).

epage added a commit to epage/toml_edit that referenced this pull request Dec 14, 2021
I remember wanting to remove this before and had problems.  It looks
like those have since been addressed (I think with changes to repr).

Before, we were taking any string and trying to infer what its repr
would be.

Now we only parse actualy dotted keys.

This will hopefully help with rust-lang/cargo#10176

BREAKING CHANGE: `Key::from_str` will accept fewer values and will
return slightly different errors.
@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

I think we specifically want to support any single TOML dotted-key expression in --config, but yes, that's the gist of it. I don't think it's actually important that we parse the full thing in Cargo — the current approach of checking that the argument to --config is a dotted key followed by =, and if it is parsing it with the full parser seems totally fine as a (perhaps simpler) alternative.

From the docs, it looks like Key is just a single key, and not a dotted key, is that right? Is there an type equivalent to a dotted key?

epage added a commit to toml-rs/toml that referenced this pull request Dec 14, 2021
I remember wanting to remove this before and had problems.  It looks
like those have since been addressed (I think with changes to repr).

Before, we were taking any string and trying to infer what its repr
would be.

Now we only parse actualy dotted keys.

This will hopefully help with rust-lang/cargo#10176

BREAKING CHANGE: `Key::from_str` will accept fewer values and will
return slightly different errors.
@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

Oh, and for the value I think we want to support any value that's not an inline table. Arrays, for example, are fine.

epage added a commit to epage/toml_edit that referenced this pull request Dec 14, 2021
We support parsing a lot of expressions to help users to edit their toml
document from real world code.  One thing that was missing was dotted
key parsing.

Also, this will hopefully help with rust-lang/cargo#10176
@epage
Copy link
Contributor

epage commented Dec 14, 2021

From the docs, it looks like Key is just a single key, and not a dotted key, is that right? Is there an type equivalent to a dotted key?

You are right, I forgot that it was only a simple key and not a full key path. I think its reasonable for us to extend this so people can parse keys, see toml-rs/toml#274.

I think we specifically want to support any single TOML dotted-key expression in --config, but yes, that's the gist of it. I don't think it's actually important that we parse the full thing in Cargo — the current approach of checking that the argument to --config is a dotted key followed by =, and if it is parsing it with the full parser seems totally fine as a (perhaps simpler) alternative.

This is making me wonder, why not parse the key=value as a TOML document? That seems the simplest. You can do whatever error handling on top to restrict what is supported. Granted, if you are concerned about whitespace and comments, that can be annoying to detect though it will be easier when switching to toml_edit since that is preserved and errors can be generated just like with table handling.

@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

This is making me wonder, why not parse the key=value as a TOML document?

I don't immediately see a way to inspect a Document to see that it only uses dotted keys for example? It just parses the whole thing and produces a Map for you, which means I can't actually check the input syntax.

I'm working on an update to this PR now that moves to toml_edit's parsing, and I noticed https://docs.rs/toml_edit/latest/src/toml_edit/de/mod.rs.html#70-76 which I believe should be using serde::DeserializedOwned as a bound rather than Deserialize<'static>. That will at least make them much more useful 😅

epage added a commit to toml-rs/toml that referenced this pull request Dec 14, 2021
We support parsing a lot of expressions to help users to edit their toml
document from real world code.  One thing that was missing was dotted
key parsing.

Also, this will hopefully help with rust-lang/cargo#10176
@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

My current draft looks something like this:

// We only want to allow "dotted key" (see https://toml.io/en/v1.0.0#keys)
// expressions followed by a value that's not an "inline table"
// (https://toml.io/en/v1.0.0#inline-table). There isn't a way to constrain the
// toml_edit parser to _just_ an expression of that form, but we can get pretty far
// by checking if any `=`-separated prefix parses as the `toml_edit::Key` type. If
// there are multiple `=` in there, we need to split at each one until parsing
// succeeds, because the first `=` could be in a "-quoted string for example.
let mut separator = None;
for (ind, c) in arg.char_indices() {
    if c != '=' {
        continue;
    }
    if let Ok(_) = arg[..ind].parse::<toml_edit::Key>() {
        separator = Some(ind);
        break;
    }
}
let separator = if let Some(s) = separator {
    s
} else {
    bail!(
        "--config argument `{}` was not a TOML dotted key expression (a.b.c = _)",
        arg
    );
};
let toml_v = arg[(separator + 1)..]
    .trim()
    .parse::<toml_edit::Value>()
    .with_context(|| {
        format!("failed to parse value from --config argument `{}`", arg)
    })?;
if let toml_edit::Value::InlineTable(_) = toml_v {
    bail!(
        "--config argument `{}` uses inline table values, which are not accepted",
        arg,
    );
}

// Rather than trying to piece the key/value back into a toml::Value, we just parse
// the whole argument again now that we know it has the right format.
let toml_v = toml::from_str(arg).with_context(|| {
    format!("failed to parse value from --config argument `{}`", arg)
})?;

Though it fails on dotted key expressions as expected.

@epage
Copy link
Contributor

epage commented Dec 14, 2021

Though it fails on dotted key expressions as expected.

Release is now out with Key::parse

I don't immediately see a way to inspect a Document to see that it only uses dotted keys for example? It just parses the whole thing and produces a Map for you, which means I can't actually check the input syntax.

For toml_edit, you can check TableLike::is_dotted() to make sure a table only exists via dotted keys. You can also check either table type's length to ensure its one and the value and table's Decor for comments.

I'm working on an update to this PR now that moves to toml_edit's parsing, and I noticed https://docs.rs/toml_edit/latest/src/toml_edit/de/mod.rs.html#70-76 which I believe should be using serde::DeserializedOwned as a bound rather than Deserialize<'static>. That will at least make them much more useful sweat_smile

Haven't messed with DeserializedOwned, I'll take a look

@jonhoo
Copy link
Contributor Author

jonhoo commented Dec 14, 2021

I'm fiddling with getting to use TableLike::is_dotted now, because it seems like a good path to go down, but realized that Item::as_table doesn't actually recurse into inline tables (through Value), just through top-level tables. Is that intentional?

Edit: ah, I want as_table_like!

@jonhoo
Copy link
Contributor Author

jonhoo commented Jan 20, 2022

Fixed up following the merge of #10086.

src/cargo/util/config/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Would it be possible to also reject comments or other decoration?

src/cargo/util/config/mod.rs Outdated Show resolved Hide resolved
@epage
Copy link
Contributor

epage commented Jan 24, 2022

Would it be possible to also reject comments or other decoration?

Parts to check

Alternatively,. this could be implemented via visit. Looks like we don't expose whats needed for dealing with keys. I might get a change out real quick for that.

@jonhoo
Copy link
Contributor Author

jonhoo commented Jan 24, 2022

Working on a fix to check for non-empty decorators (whitespace should be allowed I think), but am running into the fact that Formatted<T> only has .decor() when T: toml_edit::repr::ValueRepr, but while ValueRepr is pub, it's in a non-pub module and not re-exported, so I can't write a generic wrapper around it. Probably worth fixing @epage.

@epage
Copy link
Contributor

epage commented Jan 24, 2022

Working on a fix to check for non-empty decorators (whitespace should be allowed I think), but am running into the fact that Formatted<T> only has .decor() when T: toml_edit::repr::ValueRepr, but while ValueRepr is pub, it's in a non-pub module and not re-exported, so I can't write a generic wrapper around it. Probably worth fixing @epage.

How come you are dealing with Formatted rather than Value::decor?

@jonhoo
Copy link
Contributor Author

jonhoo commented Jan 24, 2022

I wanted to write a helper to deal with the non-recursive variants Value. I've since ended up writing it differently (using Value::decor), but figured I'd point out this "private type in public interface" since I spotted it :)

@jonhoo jonhoo requested a review from ehuss January 24, 2022 20:30
@jonhoo
Copy link
Contributor Author

jonhoo commented Jan 24, 2022

Would love if you have the time to look over the way I implemented this too @epage and see if it matches what you expected to see?

Copy link
Contributor

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decor handling looks good.

@ehuss
Copy link
Contributor

ehuss commented Jan 25, 2022

Thanks! ❤️

@bors r+

@bors
Copy link
Collaborator

bors commented Jan 25, 2022

📌 Commit 02e071c has been approved by ehuss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2022
@bors
Copy link
Collaborator

bors commented Jan 25, 2022

⌛ Testing commit 02e071c with merge 018acfd...

@bors
Copy link
Collaborator

bors commented Jan 25, 2022

☀️ Test successful - checks-actions
Approved by: ehuss
Pushing 018acfd to master...

@bors bors merged commit 018acfd into rust-lang:master Jan 25, 2022
@jonhoo jonhoo deleted the config-cli-simple branch January 25, 2022 19:06
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 26, 2022
Update cargo

9 commits in 95bb3c92bf516017e812e7f1c14c2dea3845b30e..1c034752de0df744fcd7788fcbca158830b8bf85
2022-01-18 17:39:35 +0000 to 2022-01-25 22:36:53 +0000
- Sync toml_edit versions (rust-lang/cargo#10329)
- Check --config for dotted keys only (rust-lang/cargo#10176)
- Remove deprecated --host arg for search and publish cmds (rust-lang/cargo#10327)
- doc: it's valid to use OUT_DIR for intermediate artifacts (rust-lang/cargo#10326)
- Use local git info for version. (rust-lang/cargo#10323)
- Fix documenting with undocumented dependencies. (rust-lang/cargo#10324)
- do not compile test for bins flagged as `test = false` (rust-lang/cargo#10305)
- Port cargo from toml-rs to toml_edit (rust-lang/cargo#10086)
- Fix new::git_default_branch with different default (rust-lang/cargo#10306)
@ehuss ehuss added this to the 1.60.0 milestone Feb 14, 2022
sunshowers added a commit to sunshowers/nextest that referenced this pull request Aug 12, 2022
sunshowers added a commit to nextest-rs/nextest that referenced this pull request Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants