diff --git a/src/tokens.md b/src/tokens.md index 3cea93381..dd3688995 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -337,9 +337,9 @@ b"\\x52"; br"\x52"; // \x52 > **Lexer**\ > C_STRING_LITERAL :\ >    `c"` (\ ->       ~\[`"` `\` _IsolatedCR_]\ ->       | BYTE_ESCAPE\ ->       | UNICODE_ESCAPE\ +>       ~\[`"` `\` _IsolatedCR_ _ZeroByte_]\ +>       | BYTE_ESCAPE _except `\0` or `\x00`_\ +>       | UNICODE_ESCAPE _except `\u{0}`, `\u{00}`, …, `\u{000000}`_\ >       | STRING_CONTINUE\ >    )\* `"` SUFFIX? @@ -372,10 +372,6 @@ starts with a `U+005C` (`\`) and continues with one of the following forms: * The _backslash escape_ is the character `U+005C` (`\`) which must be escaped in order to denote its ASCII encoding `0x5C`. -The escape sequences `\0`, `\x00`, and `\u{0000}` are permitted within the token -but will be rejected as invalid, as C strings may not contain byte `0x00` except -as the implicit terminator. - A C string represents bytes with no defined encoding, but a C string literal may contain Unicode characters above `U+007F`. Such characters will be replaced with the bytes of that character's UTF-8 representation. @@ -398,16 +394,16 @@ c"\xC3\xA6"; >    `cr` RAW_C_STRING_CONTENT SUFFIX? > > RAW_C_STRING_CONTENT :\ ->       `"` ( ~ _IsolatedCR_ )* (non-greedy) `"`\ +>       `"` ( ~ _IsolatedCR_ _ZeroByte_ )* (non-greedy) `"`\ >    | `#` RAW_C_STRING_CONTENT `#` Raw C string literals do not process any escapes. They start with the character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256 of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The -_raw C string body_ can contain any sequence of Unicode characters and is -terminated only by another `U+0022` (double-quote) character, followed by the -same number of `U+0023` (`#`) characters that preceded the opening `U+0022` -(double-quote) character. +_raw C string body_ can contain any sequence of Unicode characters (other than +`U+0000`) and is terminated only by another `U+0022` (double-quote) character, +followed by the same number of `U+0023` (`#`) characters that preceded the +opening `U+0022` (double-quote) character. All characters contained in the raw C string body represent themselves in UTF-8 encoding. The characters `U+0022` (double-quote) (except when followed by at