Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner: Generate error on inbalanced RLO/LRO/PDF override markers. #10326

Merged
merged 1 commit into from
Dec 15, 2020

Conversation

christianparpart
Copy link
Member

@christianparpart christianparpart commented Nov 18, 2020

Implements nested counting on RLO/LRO/PDF

Refs #10254

TODO:

  • changelog entry
  • tests for single line comments, multiline comments, too many pushes, and too many pops.
  • validate RLO/RLO/PDO (including RLE/LRE)
  • validate for RLI/LRI/PDI
  • use validators in string literals and have tests for them.

@stackenbotten

This comment has been minimized.

@stackenbotten

This comment has been minimized.

@stackenbotten

This comment has been minimized.

@christianparpart christianparpart marked this pull request as ready for review November 20, 2020 13:26
@christianparpart christianparpart force-pushed the unicode-directional-marks branch 3 times, most recently from ee33371 to fe19966 Compare November 20, 2020 15:02
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not finished yet but the call is coming up so here's the first batch.

Also, some general remarks:

  • Isn't this a breaking change?
  • What about PDI and LRI/RLI? Do we want to handle these too?

Changelog.md Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.h Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of my review.

liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
scripts/test_antlr_grammar.sh Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
@christianparpart christianparpart force-pushed the unicode-directional-marks branch 2 times, most recently from 400f27a to 8f428cf Compare November 25, 2020 13:13
@stackenbotten
Copy link

There was an error when running chk_coding_style for commit 8f428cf37843addce4ae668071c411d30b832e4f:

Coding style error:
 liblangutil/Scanner.cpp:306: for (auto const& seq : boost::make_iterator_range(m_pushSequenceBegin, m_pushSequencesEnd))
 liblangutil/Scanner.cpp:292: UnicodeDirectionProcessor(Scanner& _scanner, string_view _pop, string_view const* _pushBegin, string_view const*_pushEnd):

Please check that your changes are working as intended.

@stackenbotten
Copy link

There was an error when running chk_coding_style for commit 54e215f4b95c026262fa45bd0a77c30d60b5bb5f:

Error: Trailing whitespace found:
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:1:contract TimelockUpgrade {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:2: uint256 public upgradeM;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:3: uint256 public upgradeD;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:4: address public masterCopy;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:5: address public proposedMaster;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:6: event UpgradeProposed(address proposed, uint256 month, uint256 day);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:7: event UpgradeConfirmed(address newMaster);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:8:^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:9: function proposeUpgrade(address _upgradeAddress) external onlyOwner^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:10: {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:11: uint256 m; ^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:12: uint256 d;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:13: (/*year*/, /*month*/ m, /*day*/ d, /*hour*/, /*minute*/, /*second*/) = BokkyDateTime.timestampToDateTime(block.timestamp);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:14:^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:15: upgradeM = (m % 12) + 1; // If it's Dec, advance to Jan.^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:16: upgradeD = d > 28 ? 28 : d; // Protect for February.^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:17:^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:18: proposedMaster = _upgradeAddress;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:19: }^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:20:^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:21: function confirmUpgrade() external^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:22: {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:23: uint256 m;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:24: uint256 d;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:25: (/*year*/, /*monthM-bM-^@M-.*/ ,d /*yad*/ ,m /*M-bM-^@M-,M-bM-^@M-,hour*/, /*minute*/, /*second*/) = BokkyDateTime.timestampToDateTime(block.timestamp);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:26:^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:27: if (m == upgradeM && d == upgradeD) {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:28: masterCopy = proposedMaster;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:29: emit UpgradeConfirmed(masterCopy);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:30: }^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:31: }^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:32:}^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:33:^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:34:// ----^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:35:// ParserError 8936: (1253-1264): Mismatching directional override markers in comment or string literal.^M

Please check that your changes are working as intended.

@stackenbotten
Copy link

There was an error when running chk_coding_style for commit cfee4a3bedc5644a50e27fff09874c6233b32695:

Error: Trailing whitespace found:
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:1:contract TimelockUpgrade {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:2: function confirmUpgrade() external {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:3: uint256 m;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:4: uint256 d;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:5: (/*year*/,/*monthM-bM-^@M-.*/,d/*yad*/,m/*M-bM-^@M-,M-bM-^@M-,hour*/,/*minute*/,/*second*/) = BokkyDateTime.timestampToDateTime(block.timestamp);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:6: }^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:7:}^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:8:^M

Please check that your changes are working as intended.

@stackenbotten
Copy link

There was an error when running chk_coding_style for commit 89d09cd2a563466495dc0cdd668b5181530054e5:

Error: Trailing whitespace found:
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:1:contract TimelockUpgrade {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:2: function confirmUpgrade() external {^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:3: uint256 m;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:4: uint256 d;^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:5: (/*year*/,/*monthM-bM-^@M-.*/,d/*yad*/,m/*M-bM-^@M-,M-bM-^@M-,hour*/,/*minute*/,/*second*/) = BokkyDateTime.timestampToDateTime(block.timestamp);^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:6: }^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:7:}^M
 test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol:8:^M

Please check that your changes are working as intended.

@mijovic mijovic force-pushed the unicode-directional-marks branch 2 times, most recently from 8ba0d9f to 01a465c Compare December 15, 2020 15:19
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
liblangutil/Scanner.cpp Outdated Show resolved Hide resolved
Changelog.md Outdated Show resolved Hide resolved
let s := unicode"underflow ‬";
}
// ----
// ParserError 1856: (35-47): Literal or identifier expected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like AsmParser doesn't accept unicode directional sequences.
It fails as currentToken() returns 'Token::Illegal'. So, I would say it is fine for now, so it is not even possible to get this problem in yul, but we would need to decide how to proceed on this in yul...

I created issue so we can discuss this topic #10622

@axic
Copy link
Member

axic commented Dec 15, 2020

From #10326 (comment)

Legitimacy is an arguable point, but real Arabic/Hebrew/etc keyboards can't type BiDi control characters, so you are not likely to come across them in non-malicious user input.

If this is true, why do we bother? Why don't we reject them for 0.8.0 and consult more with relevant people? (could have asked about language use in comments/literals on the user survey -- cc @franzihei)

@mijovic
Copy link
Contributor

mijovic commented Dec 15, 2020

I added few more tests, hope it is good coverage now

@cameel
Copy link
Member

cameel commented Dec 15, 2020

The biggest problem with this PR is basically that it's hard to know for sure so we try to be careful not to accidentally forbid something that turns out to be common. Asking people is definitely a good idea but it will not be the ultimate answer. Even if typical keyboards do not have this, some IMEs could. And AFAIK these features in Unicode are (relatively) new, especially LRI/RLI/PDI (that we don't even support here) so they could just not be in use yet. At this point I'm more or less convinced that it's rarely used in practice but this is still a conclusion supported mostly by lack of information :)

Also, to be honest, just attempting to push this through forced us (at least me) to explore the topic and form an opinion. Now I think that @ekpyron's proposal from #10607 in some form is the way to go but it's a bit too late to switch to that and still get it included in 0.8.0. I think that this PR is good enough if we want to have at least something to protect against the trick from the underhanded contest. We can switch to something better thought out in 0.9.0.

@axic
Copy link
Member

axic commented Dec 15, 2020

I was always on the opinion to just outright reject most of this complexity: #10254 (comment)

But as discussed in #10607 unicode might be introducing new feature we can not prepare for, so there is always a risk.

@mijovic mijovic force-pushed the unicode-directional-marks branch 2 times, most recently from a743337 to 460ed18 Compare December 15, 2020 16:42
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems mostly fine to me. The only remarks I have are minor text/naming tweaks (sorry for the one the changelog; seems like it's the tenth time or so you're tweaking it :P).

I wonder if a test for using that in imported file name or multisource-test file name would be useful. I guess it's still just a string literal so probably not...

pair<string_view, int>{"\xE2\x80\xAE", 1}, // U+202E (RLO - Right-to-Left Override)
pair<string_view, int>{"\xE2\x80\xAA", 1}, // U+202A (LRE - Left-to-Right Embedding)
pair<string_view, int>{"\xE2\x80\xAB", 1}, // U+202B (RLE - Right-to-Left Embedding)
pair<string_view, int>{"\xE2\x80\xAC", -1} // PDF - Pop Directional Formatting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pair<string_view, int>{"\xE2\x80\xAC", -1} // PDF - Pop Directional Formatting
pair<string_view, int>{"\xE2\x80\xAC", -1} // U+202C (PDF - Pop Directional Formatting)

WHITESPACE=$(git grep -n -I -E "^.*[[:space:]]+$" | grep -v "test/libsolidity/ASTJSON\|test/libsolidity/ASTRecoveryTests\|test/compilationTests/zeppelin/LICENSE")
WHITESPACE=$(git grep -n -I -E "^.*[[:space:]]+$" |
grep -v "test/libsolidity/ASTJSON\|test/libsolidity/ASTRecoveryTests\|test/compilationTests/zeppelin/LICENSE" |
grep -v -E "test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment saying why stuff like this is disabled always helps when you have to refactor it later.

Comment on lines 83 to 84
case ScannerError::DirectionalOverrideUnderflowInComment: return "Unicode direction override underflow in comment or string literal.";
case ScannerError::DirectionalOverrideMismatchInComment: return "Mismatching directional override markers in comment or string literal.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that it's not just the error message that was mentioning only comments:

Suggested change
case ScannerError::DirectionalOverrideUnderflowInComment: return "Unicode direction override underflow in comment or string literal.";
case ScannerError::DirectionalOverrideMismatchInComment: return "Mismatching directional override markers in comment or string literal.";
case ScannerError::DirectionalOverrideUnderflow: return "Unicode direction override underflow in comment or string literal.";
case ScannerError::DirectionalOverrideMismatch: return "Mismatching directional override markers in comment or string literal.";

Changelog.md Outdated Show resolved Hide resolved
@@ -0,0 +1,7 @@
contract C {
function f() public pure {
/* LRO‭ LRE‪ RLE ‫ PDF‬ RLO‮ PDF ‬ PDF‬
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to point out the FDP and only then realized what's going on. It works :) Lost opportunity for O RLY though :)

@cameel
Copy link
Member

cameel commented Dec 15, 2020

@axic Personally I'm fine either way. Not including this in 0.8.0 and doing it properly for the next breaking release would be best but I'm also assuming that we really want something against that attack. This PR on its own does not get that deep into Unicode yet and is small so should be easy to revert if we want to change the direction. Most of it are tests actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants