Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not warn about similar ASCII-only idents #2923

Closed
wants to merge 1 commit into from
Closed

Do not warn about similar ASCII-only idents #2923

wants to merge 1 commit into from

Conversation

pyfisch
Copy link
Contributor

@pyfisch pyfisch commented May 8, 2020

See discussion at rust-lang/rust#55467 (comment) ff.

Rename less_used_codepoints to uncommon_codepoints (rust-lang/rust#55467 (comment))

cc @crlf0710

See discussion at rust-lang/rust#55467 (comment) ff.

Rename `less_used_codepoints` to `uncommon_codepoints` (rust-lang/rust#55467 (comment))
@pyfisch
Copy link
Contributor Author

pyfisch commented May 8, 2020

One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric person_e1 and person_el are not confusable but 人_e1 and 人_el are confusable.

@crlf0710
Copy link
Member

crlf0710 commented May 8, 2020

Looks great.

By the way, another unsolved question:

In mixed_script_confusables, do we actually need to make an exception for Latin identifiers?

Reading the actual generated table, the answer is also clearly 'yes', i think.

The mixed script confusable alphabet for ascii latin is:
ABCEHIJKMNOPSTUVWXYZ
abcefghijlnopqrsuvwxy

and a single lib.rs with pub fn foo() {} will trigger the mixed_script_confusable lint if the exception is not made.

cc @Manishearth

@Manishearth
Copy link
Member

Manishearth commented May 8, 2020

@crlf0710 that table seems incorrectly generated.

@Manishearth
Copy link
Member

oh, never mind

@Manishearth
Copy link
Member

Yes, we need a Latin exception, because the stdlib is Latin. The RFC states this outright.

@pyfisch
Copy link
Contributor Author

pyfisch commented May 9, 2020

Should I just remove the (un)resolved question about mixed_script_confusables?

@Manishearth
Copy link
Member

No, that's not how unresolved questions work, we resolve them in the tracking issue if necessary.

@crlf0710
Copy link
Member

Implementation adjustment is ready at rust-lang/rust#72069

@joshtriplett
Copy link
Member

One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric person_e1 and person_el are not confusable but 人_e1 and 人_el are confusable.

Is there some way we can implement the notion that the actual confusable characters must be from different scripts?

@Manishearth
Copy link
Member

That's already in the RFC in the heuristics for mixed_script_confusables

@pickfire
Copy link
Contributor

One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric person_e1 and person_el are not confusable but 人_e1 and 人_el are confusable.

Aren't them the same? Person and 人 have the same meaning but there is no easy way to write el or e1 in Chinese or Japanese. So a mix of characters is still likely needed, such as 人1 or 人2, it would be weird to see 人一 or 人二.

I don't know if writing variables in non-latin would be helpful but a mix for numeric might be needed.

@Manishearth
Copy link
Member

Aren't them the same?

No, one is rén + the letter e + the number 1, the other is rén + the letter e + the lowercase letter l

@Manishearth
Copy link
Member

I would rather use @joshtriplett's heuristic here. mixed_script_confusables already uses this, we just need to reuse this: we lint on different strings that map to each other via mixed-script-confusables only.

@pickfire
Copy link
Contributor

@Manishearth Ah, you are talking about the e1 and el. But isn't that up to the font used by the developer, should we even limit that?

By the way, how did you type rén?

@Manishearth
Copy link
Member

Manishearth commented May 13, 2020

@pickfire sigh the point is that the current patch catches that case incorrectly because it contains a non-confusable non-ascii character as well.

I have a compose key on my keyboard. I also have a chinese input method set up, but I wanted to specifically list out the word there.

[unicode-set-allowed]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%5D&g=&i=Script_Extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this link correct? Why do I get 404?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tool is down right now, unicode is having IT issues

@pyfisch
Copy link
Contributor Author

pyfisch commented Jul 2, 2020

I don't think this is considered anymore.

@pyfisch pyfisch closed this Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants