Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search on https://doc.rust-lang.org/ not doing well #103357

Closed
szabgab opened this issue Oct 21, 2022 · 5 comments · Fixed by #107141
Closed

search on https://doc.rust-lang.org/ not doing well #103357

szabgab opened this issue Oct 21, 2022 · 5 comments · Fixed by #107141
Labels
A-rustdoc-search Area: Rustdoc's search feature C-bug Category: This is a bug. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@szabgab
Copy link
Contributor

szabgab commented Oct 21, 2022

I could not figure out a better place to report the issue with https://doc.rust-lang.org/ (which might be an issue on its own, to have a link from the docs to the source code)

The problem is that the search on https://doc.rust-lang.org/ does not seem to function well:

e.g. https://doc.rust-lang.org/std/index.html?search=regex returns lots of hits, none of them related to regex.

OTOH https://doc.rust-lang.org/std/index.html?search=println works quite well.

https://doc.rust-lang.org/std/index.html?search=print starts out quite well, but then also includes things that end with ::pin and ::hint that seem to be unrelated.

@szabgab szabgab added the C-bug Category: This is a bug. label Oct 21, 2022
@Manishearth Manishearth transferred this issue from rust-lang/www.rust-lang.org Oct 21, 2022
@Manishearth
Copy link
Member

cc @rust-lang/rustdoc

This is a rustdoc search issue

In general for the regex thing there's not much to be done since there's no regex in the stdlib (use docs.rs/regex). I suspect the unrelated ones come up due to Levenshtein distance searching (and same for pin)

Basically, rustdoc search attempts to catch typos, and that's what you're experiencing.

@notriddle notriddle added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. A-rustdoc-search Area: Rustdoc's search feature labels Oct 21, 2022
@jsha
Copy link
Contributor

jsha commented Oct 21, 2022

One problem here is that our acceptable Levenshtein distance is too high: we allow results with up to 3 edits (character deletion, insertion, or removal). For a query like regex, that means we allow deleting up to 3 characters to match an unreleated name like "ge". Reducing the max Levenshtein distance to 2 would go a long way towards improving these no-match cases. I think its exceedingly rare for something to be a real match at a distance of 3.

Also, we should do a better job of explaining to the user what the scope of the search is: the standard library; or a single crate; or a group of crates. Right now on no-results we show the same output, regardless of whether we're on doc.rust-lang.org or docs.rs or something else.

No results :(
Try on DuckDuckGo?

Or try looking in one of these:
The Rust Reference for technical details about the language.
Rust By Example for expository code examples.
The Rust Book for introductions to language features and the language itself.
Docs.rs for documentation of crates released on crates.io.

@notriddle
Copy link
Contributor

notriddle commented Oct 21, 2022

Reducing the max Levenshtein distance to 2 would go a long way towards improving these no-match cases. I think its exceedingly rare for something to be a real match at a distance of 3.

I think the acceptable Levenshtein distance really depends on the query length.

For example, a single-character query https://doc.rust-lang.org/nightly/std/?search=x returns a bunch of results that don't have an "x" in them at all. For queries of length 1, the obviously correct max distance is 0. Anything else would completely ignore what you wrote. Also, https://doc.rust-lang.org/nightly/std/?search=fn contains "as" in the list of results, which contains neither an "f" nor an "n".

Similarly, if your "a" key is broken, https://doc.rust-lang.org/nightly/std/?search=std%3A%3Ahshmp%3A%3Ahshmp currently matches nothing. If your query is 17 characters long, it can probably tolerate a distance higher than 3, because there's a lot more redundancy in what you typed.

@szabgab
Copy link
Contributor Author

szabgab commented Oct 22, 2022

Thanks for all the responses so far!

Related to the source of the results:

As a newbie definitely, but also as an advanced user I'd probably prefer a dedicated search engine that would search in both std and crates (and maybe also in the reference and the books).
After all I don't know if a certain feature (e.g. regexes in my case) are part of what comes with Rust or a 3rd party crate.

It might be a good idea to give preference to results from std over the crates and over the books. It would be probably also good to let the user filter the search based on the source.

@fmease
Copy link
Member

fmease commented Oct 22, 2022

I think the acceptable Levenshtein distance really depends on the query length.

That's what the rustc name resolver does (last time I checked) when it encounters unresolved identifiers to suggest similarly named bindings. Namely, it checks if the following holds:

strsim::levenshtein(other_identifier, identifier) <= std::cmp::max(identifier.len(), 3) / 3

notriddle added a commit to notriddle/rust that referenced this issue Oct 29, 2022
The heuristic is pretty close to the name resolver.

Fixes rust-lang#103357
notriddle added a commit to notriddle/rust that referenced this issue Jan 24, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 6, 2023
…-2023, r=GuillaumeGomez

rustdoc: compute maximum Levenshtein distance based on the query

Preview: https://notriddle.com/notriddle-rustdoc-demos/search-lev-distance-2023/std/index.html?search=regex

The heuristic is pretty close to the name resolver, maxLevDistance = `Math.floor(queryLen / 3)`.

Fixes rust-lang#103357
Fixes rust-lang#82131

Similar to rust-lang#103710, but following the suggestion in rust-lang#103710 (comment) to use `floor` instead of `ceil`, and unblocked now that rust-lang#105796 made it so that setting the max lev distance to `0` doesn't cause substring matches to be removed.
@bors bors closed this as completed in e09e6df Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-search Area: Rustdoc's search feature C-bug Category: This is a bug. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
5 participants