Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Ident in DefPathData instead of Symbol #69906

Closed

Conversation

Aaron1011
Copy link
Member

This allows us to recover span information when emitting cross crate
errors. A number of test cases benefit from this change, since we were
previously not emitting messages due to invalid spans.

There are four main parts to this commit:

  1. Adding Ident to DefPathData, and updating the affected code. This
    mostly consists of mechanical changes.

  2. Updating how we determine the disambiguator for a DefPath. Since
    DefPathData now stores a Span (inside the Ident), we need to
    make sure we ignore this span we determining if a path needs to be
    disambiguated. This ensure that two paths with the same symbols but
    different spans are considered equal (this can occur when a macro is
    repeatedly expanded to a definition).

  3. Ensuring that we are able to decode Spans when decoding the
    DefPathTable. Since the DefPathTable is stored in CrateMetadata, we
    must decode it before we have a CrateMetadata available. Since
    decoding a Span requires access to several fields from
    CrateMetadata, this commit adds a new struct InitialCrateMetadata,
    which implements Metadata and stores all of the necessary fields.

  4. The specialized metadata encoder/decoder impls for Ident are
    removed. This causes us to fall back to the default encoder/decoder
    implementations for Ident, which simply serializes and deserializes
    the fields of Ident. This is strictly an improvement - we still don't
    have any hygiene information, but we now have a non-dummy Span.

This should hopefully allow us to test PR #68941, since we will now use
cross-crate spans in more places.

@rust-highfive
Copy link
Collaborator

r? @cramertj

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 11, 2020
| ----
| |
| not covered
| not covered
Copy link
Member Author

@Aaron1011 Aaron1011 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing, but it also occurs for enums defined in the same crate, so it isn't an issue with this PR.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-7 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2020-03-11T03:57:43.2549139Z ========================== Starting Command Output ===========================
2020-03-11T03:57:43.2552243Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/e3bdf546-eb15-4320-b146-9fd51a42df4a.sh
2020-03-11T03:57:43.2552657Z 
2020-03-11T03:57:43.2556578Z ##[section]Finishing: Disable git automatic line ending conversion
2020-03-11T03:57:43.2575922Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/69906/merge to s
2020-03-11T03:57:43.2579080Z Task         : Get sources
2020-03-11T03:57:43.2579369Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-03-11T03:57:43.2579669Z Version      : 1.0.0
2020-03-11T03:57:43.2579861Z Author       : Microsoft
---
2020-03-11T03:57:44.5259409Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-03-11T03:57:44.5267784Z ##[command]git config gc.auto 0
2020-03-11T03:57:44.5273703Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-03-11T03:57:44.5278510Z ##[command]git config --get-all http.proxy
2020-03-11T03:57:44.5286559Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/69906/merge:refs/remotes/pull/69906/merge
---
2020-03-11T04:53:21.8835755Z .................................................................................................... 1700/9755
2020-03-11T04:53:26.5510868Z .................................................................................................... 1800/9755
2020-03-11T04:53:37.8368035Z .............................................................i...................................... 1900/9755
2020-03-11T04:53:45.1978748Z .................................................................................................... 2000/9755
2020-03-11T04:53:59.0619735Z ...................................................iiiii............................................ 2100/9755
2020-03-11T04:54:09.1202175Z .................................................................................................... 2300/9755
2020-03-11T04:54:11.4179962Z .................................................................................................... 2400/9755
2020-03-11T04:54:14.4949627Z .................................................................................................... 2500/9755
2020-03-11T04:54:35.1645008Z .................................................................................................... 2600/9755
---
2020-03-11T04:57:11.9522370Z .....................i...............i.............................................................. 5000/9755
2020-03-11T04:57:21.5884725Z .................................................................................................... 5100/9755
2020-03-11T04:57:26.7186467Z ................................................................i................................... 5200/9755
2020-03-11T04:57:32.7210339Z .................................................................................................... 5300/9755
2020-03-11T04:57:41.5744269Z .............................................ii.ii........i...i..................................... 5400/9755
2020-03-11T04:57:49.4978179Z .................................................................................................... 5600/9755
2020-03-11T04:57:58.6831060Z .................................................................................................... 5700/9755
2020-03-11T04:58:05.2172829Z ....................................i............................................................... 5800/9755
2020-03-11T04:58:11.0024563Z .................................................................................................... 5900/9755
2020-03-11T04:58:11.0024563Z .................................................................................................... 5900/9755
2020-03-11T04:58:21.1179960Z .................................................................................................... 6000/9755
2020-03-11T04:58:30.2896253Z .............................ii...i..ii...........i................................................. 6100/9755
2020-03-11T04:58:47.7267109Z .................................................................................................... 6300/9755
2020-03-11T04:58:52.1351880Z .................................................................................................... 6400/9755
2020-03-11T04:58:52.1351880Z .................................................................................................... 6400/9755
2020-03-11T04:58:58.3299744Z ............................................................i..ii................................... 6500/9755
2020-03-11T04:59:23.0052529Z .................................................................................................... 6700/9755
2020-03-11T04:59:26.6899815Z ......................................................i............................................. 6800/9755
2020-03-11T04:59:28.7606437Z .................................................................................................... 6900/9755
2020-03-11T04:59:30.8395362Z .....................................................................................i.............. 7000/9755
---
2020-03-11T05:01:04.3613122Z .................................................................................................... 7700/9755
2020-03-11T05:01:08.3968310Z .................................................................................................... 7800/9755
2020-03-11T05:01:14.1976361Z .................................................................................................... 7900/9755
2020-03-11T05:01:21.0302828Z ...................................i................................................................ 8000/9755
2020-03-11T05:01:29.6713178Z ....................................................................................iiiiiiiiii.i.... 8100/9755
2020-03-11T05:01:44.9292639Z ............................i......i................................................................ 8300/9755
2020-03-11T05:01:49.6928025Z .................................................................................................... 8400/9755
2020-03-11T05:02:00.4728240Z .................................................................................................... 8500/9755
2020-03-11T05:02:11.8362835Z .................................................................................................... 8600/9755
---
2020-03-11T05:04:31.4524118Z  finished in 7.079
2020-03-11T05:04:31.4718789Z Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-03-11T05:04:31.6685523Z 
2020-03-11T05:04:31.6685947Z running 179 tests
2020-03-11T05:04:34.6093821Z iiii......i...........ii..iiii....i....i...........i............i..i..................i....i........ 100/179
2020-03-11T05:04:36.9138857Z ....i.i.i...iii..iiiiiiiiiiiiiiii.......................iii............ii......
2020-03-11T05:04:36.9144602Z 
2020-03-11T05:04:36.9146528Z  finished in 5.442
2020-03-11T05:04:36.9329081Z Check compiletest suite=codegen-units mode=codegen-units (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-03-11T05:04:37.1016888Z 
---
2020-03-11T05:04:39.0506529Z  finished in 2.116
2020-03-11T05:04:39.0702808Z Check compiletest suite=assembly mode=assembly (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-03-11T05:04:39.2237272Z 
2020-03-11T05:04:39.2239295Z running 9 tests
2020-03-11T05:04:39.2240846Z iiiiiiiii
2020-03-11T05:04:39.2242461Z 
2020-03-11T05:04:39.2242707Z  finished in 0.153
2020-03-11T05:04:39.2456348Z Check compiletest suite=incremental mode=incremental (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-03-11T05:04:39.4314593Z 
---
2020-03-11T05:04:58.7630435Z  finished in 19.517
2020-03-11T05:04:59.3513568Z Check compiletest suite=debuginfo mode=debuginfo (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-03-11T05:04:59.7683504Z 
2020-03-11T05:04:59.7728301Z running 115 tests
2020-03-11T05:05:12.6571065Z iiiii..i......i.i...i..i.i.i..i..i..ii....i.i....ii..........iiii.........i.....i..i.......ii.i.ii.. 100/115
2020-03-11T05:05:14.1860073Z ...iiii.....ii.
2020-03-11T05:05:14.1865719Z 
2020-03-11T05:05:14.1865974Z  finished in 14.834
2020-03-11T05:05:14.1866904Z Uplifting stage1 rustc (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-03-11T05:05:14.1867963Z Copying stage2 rustc from stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
---
2020-03-11T05:17:02.3009219Z 
2020-03-11T05:17:02.3010053Z    Doc-tests core
2020-03-11T05:17:06.7375108Z 
2020-03-11T05:17:06.7422392Z running 2480 tests
2020-03-11T05:17:15.3040246Z ......iiiii......................................................................................... 100/2480
2020-03-11T05:17:23.7223954Z ....................................................................................ii.............. 200/2480
2020-03-11T05:17:42.9181143Z ...................i................................................................................ 400/2480
2020-03-11T05:17:42.9181143Z ...................i................................................................................ 400/2480
2020-03-11T05:17:51.9582408Z ........................................................................i..i..................iiii.. 500/2480
2020-03-11T05:18:07.2393231Z .................................................................................................... 700/2480
2020-03-11T05:18:15.0569493Z .................................................................................................... 800/2480
2020-03-11T05:18:23.0304130Z .................................................................................................... 900/2480
2020-03-11T05:18:31.0330871Z .................................................................................................... 1000/2480
---
2020-03-11T05:21:50.5030980Z 
2020-03-11T05:21:50.5031319Z running 1010 tests
2020-03-11T05:22:06.6665940Z i................................................................................................... 100/1010
2020-03-11T05:22:16.0680740Z .................................................................................................... 200/1010
2020-03-11T05:22:22.6370721Z ..................iii......i......i...i......i...................................................... 300/1010
2020-03-11T05:22:27.3924428Z .................................................................................................... 400/1010
2020-03-11T05:22:33.6510495Z ............................................i..i......................................ii............ 500/1010
2020-03-11T05:22:45.3228053Z .................................................................................................... 700/1010
2020-03-11T05:22:45.3228053Z .................................................................................................... 700/1010
2020-03-11T05:22:51.6884304Z ....................................iiii............................................................ 800/1010
2020-03-11T05:23:04.3792581Z .................................................................................................... 900/1010
2020-03-11T05:23:10.7045602Z ..........................................................iiii...................................... 1000/1010
2020-03-11T05:23:11.1209550Z test result: ok. 990 passed; 0 failed; 20 ignored; 0 measured; 0 filtered out
2020-03-11T05:23:11.1209809Z 
2020-03-11T05:23:11.1312790Z  finished in 152.959
2020-03-11T05:23:11.1326219Z Testing term stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
---
2020-03-11T05:38:55.7218569Z Rustbook (x86_64-unknown-linux-gnu) - edition-guide
2020-03-11T05:38:56.1267570Z Building stage0 tool linkchecker (x86_64-unknown-linux-gnu)
2020-03-11T05:38:56.2899421Z    Compiling linkchecker v0.1.0 (/checkout/src/tools/linkchecker)
2020-03-11T05:38:57.6451696Z     Finished release [optimized] target(s) in 1.51s
2020-03-11T05:39:02.1104656Z proc_macro/token_stream/struct.IntoIter.html:40: broken link - core/ops/r
2020-03-11T05:39:02.1108423Z proc_macro/token_stream/struct.IntoIter.html:41: broken link - core/ops/r
2020-03-11T05:39:02.1109824Z proc_macro/token_stream/struct.IntoIter.html:47: broken link - core/ops/r
2020-03-11T05:39:04.4940978Z thread 'main' panicked at 'found some broken links', src/tools/linkchecker/main.rs:43:9
2020-03-11T05:39:04.4969350Z 
2020-03-11T05:39:04.4975795Z 
2020-03-11T05:39:04.4976906Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/linkchecker" "/checkout/obj/build/x86_64-unknown-linux-gnu/doc"
2020-03-11T05:39:04.4977711Z expected success, got: exit code: 101
---
2020-03-11T05:39:04.5078319Z   local time: Wed Mar 11 05:39:04 UTC 2020
2020-03-11T05:39:05.6186932Z   network time: Wed, 11 Mar 2020 05:39:05 GMT
2020-03-11T05:39:05.6187448Z == end clock drift check ==
2020-03-11T05:39:07.2586461Z 
2020-03-11T05:39:07.2635935Z ##[error]Bash exited with code '1'.
2020-03-11T05:39:07.2651280Z ##[section]Finishing: Run build
2020-03-11T05:39:07.2703958Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/69906/merge to s
2020-03-11T05:39:07.2709137Z Task         : Get sources
2020-03-11T05:39:07.2709512Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-03-11T05:39:07.2709835Z Version      : 1.0.0
2020-03-11T05:39:07.2710063Z Author       : Microsoft
2020-03-11T05:39:07.2710063Z Author       : Microsoft
2020-03-11T05:39:07.2710439Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
2020-03-11T05:39:07.2710854Z ==============================================================================
2020-03-11T05:39:07.6841204Z Cleaning any cached credential from repository: rust-lang/rust (GitHub)
2020-03-11T05:39:07.6878446Z ##[section]Finishing: Checkout rust-lang/rust@refs/pull/69906/merge to s
2020-03-11T05:39:07.6973282Z Cleaning up task key
2020-03-11T05:39:07.6974529Z Start cleaning up orphan processes.
2020-03-11T05:39:07.7197686Z Terminate orphan process: pid (3874) (python)
2020-03-11T05:39:07.7779352Z ##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@petrochenkov petrochenkov self-assigned this Mar 11, 2020
@bjorn3
Copy link
Member

bjorn3 commented Mar 11, 2020

Won't this regress incremental compilation, as changing the span of a definition will change it's DefPath, thus producing a definition that is different from the point of view of the incremental system?

@Aaron1011
Copy link
Member Author

Aaron1011 commented Mar 11, 2020

@bjorn3: The DefPathHash still only uses symbol names, not spans (hence the need to explicitly ignore spans when determining the didambiguator, as the paths could otherwise be identical).

@Aaron1011
Copy link
Member Author

@bjorn3: Note that we already associate a Span with each definition - this just adds a Span that specifically points to the identifier. I don't think this will make incremental compilation any more dependent on Spans than it already is - but it might be good to get a perf run to check.

@petrochenkov
Copy link
Contributor

@Aaron1011

Note that we already associate a Span with each definition - this just adds a Span that specifically points to the identifier.

Can this new span be kept somewhere near that old span instead of what this PR does?

The purpose of DefPathData is to serve as a key in some table, and expanding it with some data that has to be specifically excluded before that key can be compared or hashed doesn't look like a good idea.

@petrochenkov
Copy link
Contributor

r? @eddyb

@rust-highfive rust-highfive assigned eddyb and unassigned petrochenkov and cramertj Mar 11, 2020
@Aaron1011
Copy link
Member Author

Can this new span be kept somewhere near that old span instead of what this PR does?

The span is logically associated with the Symbol stored in the DefPathData variant - we're essentially just holding on to an Ident that we were previously throwing away when constructing a DefPathData.

Storing the span somewhere else would either require us to duplicate the variants of DefPathData, or create an invalid span for variants that don't have a Symbol (e.g. ClosureExpr). Both seem kind of hacky.

The purpose of DefPathData is to serve as a key in some table, and expanding it with some data that has to be specifically excluded before that key can be compared or hashed doesn't look like a good idea.

I agree that this isn't great. I'm going to see how much fallout there is from removing the Hash, Eq, and PartialEq impls, and adjusting the next_disambiguator map.

@@ -537,22 +560,22 @@ impl DefPathData {
}
}

pub fn as_symbol(&self) -> Symbol {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the direction with keeping the span in DefPathData is pursued, this method can be kept for compatibility and used in places where full identifier is not needed. That will significantly reduce the churn.

src/librustc_resolve/lib.rs Outdated Show resolved Hide resolved
@eddyb
Copy link
Member

eddyb commented Mar 11, 2020

@Aaron1011
Copy link
Member Author

Aaron1011 commented Mar 11, 2020

@petrochenkov: I removed the Eq, PartialEq, and Hash impls from DefPathData. This results in a small amount of boilerplate within definitions.rs (I added a PlainDefPathData enum that's essentially a copy of DefPathData). However, it allows the rest of the codebase to deal with Idents (which is the proper type to be storing), instead of needing to combine a Symbol with a Span retrieved from elsewhere.

There is no longer any possibility of accidentally comparing two DefPathDatas based on their Spans.

If we ever need to compare two arbitrary DefPathData instances, then the lack of Eq/PartialEq will be a problem. However, this seems like a really weird thing to do - I think you'd always want to compare two DefIds/HirIds instead.

@Zoxc
Copy link
Contributor

Zoxc commented Mar 14, 2020

This is problematic for incremental compilation. You can't include Span here without also hashing it (since hashing is used to approximate equality).

@Aaron1011
Copy link
Member Author

@Zoxc: What would be the consequences of not hashing the Span? The Span should only be used when printing error messages - how does that interact with incremental compilation?

@bors
Copy link
Contributor

bors commented Mar 15, 2020

☔ The latest upstream changes (presumably #70024) made this pull request unmergeable. Please resolve the merge conflicts.

@Aaron1011
Copy link
Member Author

Note that storing these Idents in some form is necessary for cross-crate hygiene to work properly. When we encode something like:

#![feature(decl_macro)]
macro x() { struct X; }

x!();
x!();

we need to encode the full Ident, not just the Symbol, for both definitions of X (so that we encode the different SyntaxContexts).

This means that we actually do care about the Ident's Span (at least the SyntaxContext) in places other than diagnostics.

Does storing the Span in a side table vs using an Ident make a difference in terms of incremental compilation?

This allows us to recover span information when emitting cross crate
errors. A number of test cases benefit from this change, since we were
previously not emitting messages due to invalid spans.

There are four main parts to this commit:

1. Adding `Ident` to `DefPathData`, and updating the affected code. This
mostly consists of mechanical changes.

2. Updating how we determine the disambiguator for a `DefPath`. Since
`DefPathData` now stores a `Span` (inside the `Ident`), we need to
make sure we ignore this span we determining if a path needs to be
disambiguated. This ensure that two paths with the same symbols but
different spans are considered equal (this can occur when a macro is
repeatedly expanded to a definition).

3. Ensuring that we are able to decode `Spans` when decoding the
`DefPathTable`. Since the `DefPathTable` is stored in `CrateMetadata`, we
must decode it before we have a `CrateMetadata` available. Since
decoding a `Span` requires access to several fields from
`CrateMetadata`, this commit adds a new struct `InitialCrateMetadata`,
which implements `Metadata` and stores all of the necessary fields.

4. The specialized metadata encoder/decoder impls for `Ident` are
removed. This causes us to fall back to the default encoder/decoder
implementations for `Ident`, which simply serializes and deserializes
the fields of `Ident`. This is strictly an improvement - we still don't
have any hygiene information, but we now have a non-dummy Span.

This should hopefully allow us to test PR rust-lang#68941, since we will now use
cross-crate spans in more places.
Converting an `Ident` to a string will cause us to try to guess if the
identifier is 'raw', potentially inserting a 'r#' prefix. However, all
of the existing code that string-ifys `DefPathData` is expecting to be
working with a `Symbol`, which never has a `r#` suffix.

This was causing Rustdoc to generate links containing `r#` for
structs/traits with raw identifiers (e.g. `r#struct`), leading to broken
links when the actual html files were generated without the `r#` suffix.
This will cause an error if we ever add/change variants of
`DefPathData`, as we need to ensure we strip spans from any `Ident`s
These are a footgun, since they will compare/hash based on the `Span` of
any `Ident` in the `DefPathData`.

We now use a private `PlainDefPathData` enum as the key for the
`next_disambiguator` `HashMap`. This enum is the same as `DefPathData`,
but it stores `Symbol` instead of `Ident` (e.g. what `DefPathData` used
to store).

This create a small amount of boilerplate, but allows the rest of the
codebase to work exclusively with the normal `DefPathData` enum (which
uses `Ident`s).

All of the non-`HashMap` uses of `PartialEq` turned out to be
comparisons against a constant, so `matches!` can be used instead of
`==`.
While the fields are equivalent, we we constructing `Ident`s for things
that weren't really identifiers (e.g `kw::Invalid` + the span of one of
the types in a tuple struct).
This was pointless churn
@bjorn3
Copy link
Member

bjorn3 commented Mar 17, 2020

Storing a Span in a side table automatically does the correct thing: Only add a dependency when it is actually used. Storing an Ident in the DefPath requires hashing to ignore it to avoid a dependency everywhere and that also means that there is no dependency when there should be one. If the macro x were defined in a different module and contained a function and you moved it, the cgu containing the function declared by x!() should be recompiled to make the debuginfo match.

@eddyb
Copy link
Member

eddyb commented Mar 17, 2020

Funnily enough, there's already a def_id_to_span (or I guess def_index_to_span pre-#66131) in Definitions, but I suppose that has the entire definition Span, not just the name Span.

@Zoxc
Copy link
Contributor

Zoxc commented Mar 17, 2020

@Aaron1011 Does your example result in two distinct DefIds today? Or is that something that needs fixing.

@Aaron1011
Copy link
Member Author

@Zoxc: My example does create two distinct DefIds. However, when we serializes the crate metadata, both structs end up with SyntaxContext::root() in their identifier span. This causes a resolution error when we try to import these items from a different crate, since both items end up with identical Idents.

@Aaron1011
Copy link
Member Author

Closing in favor of #70077, which uses the side table approach.

@Aaron1011 Aaron1011 closed this Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants