Skip to content

Commit

Permalink
PHP 8.1: fix retokenization of "&" character
Browse files Browse the repository at this point in the history
In PHP < 8.1, the ampersand was tokenized as a simple token, basically just a plain string "&".
As of PHP 8.1, due to the introduction of intersection types, PHP is introducing two new tokens for the ampersand.

This PR proposes to "undo" the new PHP 8.1 tokenization of the ampersand in favour of the pre-existing tokenization of the character as `T_BITWISE_AND` as it has been in PHPCS since forever.

Includes taking the new tokens into account for the "next token after a function keyword token should be a `T_STRING`" logic.

This change is already covered extensively by the tests for the `File::isReference()` method, though that method will need updating for PHP 8.1 intersection types, just like the `File::getMethodParameters()` method will need adjusting too.

This PR, in combination with PR 3400, fixes all current test failures on PHP 8.1.

We may want to consider adding an extra `'is_reference'` array key index to the token array for these tokens, which would allow the `File::isReference()` method to resolve tokens on PHP 8.1 much more quickly and more easily.

We also may want to have a think about whether we want to move to the PHP 8.1 tokenization in PHPCS 4.x. All the same, this PR should not be held back by a decision like that as, for now, it just needs to be fixed for PHPCS 3.x.
  • Loading branch information
jrfnl committed Aug 17, 2021
1 parent 5be0b00 commit 7c5b4e6
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 1 deletion.
22 changes: 21 additions & 1 deletion src/Tokenizers/PHP.php
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,25 @@ protected function tokenize($string)
}//end if
}//end if

/*
PHP 8.1 introduced two dedicated tokens for the & character.
Retokenizing both of these to T_BITWISE_AND, which is the
token PHPCS already tokenized them as.
*/

if ($tokenIsArray === true
&& ($token[0] === T_AMPERSAND_FOLLOWED_BY_VAR_OR_VARARG
|| $token[0] === T_AMPERSAND_NOT_FOLLOWED_BY_VAR_OR_VARARG)
) {
$finalTokens[$newStackPtr] = [
'code' => T_BITWISE_AND,
'type' => 'T_BITWISE_AND',
'content' => $token[1],
];
$newStackPtr++;
continue;
}

/*
If this is a double quoted string, PHP will tokenize the whole
thing which causes problems with the scope map when braces are
Expand Down Expand Up @@ -1667,7 +1686,8 @@ protected function tokenize($string)
if ($token[0] === T_FUNCTION) {
for ($x = ($stackPtr + 1); $x < $numTokens; $x++) {
if (is_array($tokens[$x]) === false
|| isset(Util\Tokens::$emptyTokens[$tokens[$x][0]]) === false
|| (isset(Util\Tokens::$emptyTokens[$tokens[$x][0]]) === false
&& $tokens[$x][1] !== '&')
) {
// Non-empty content.
break;
Expand Down
9 changes: 9 additions & 0 deletions src/Util/Tokens.php
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,15 @@
define('T_ATTRIBUTE', 'PHPCS_T_ATTRIBUTE');
}

// Some PHP 8.1 tokens, replicated for lower versions.
if (defined('T_AMPERSAND_FOLLOWED_BY_VAR_OR_VARARG') === false) {
define('T_AMPERSAND_FOLLOWED_BY_VAR_OR_VARARG', 'PHPCS_T_AMPERSAND_FOLLOWED_BY_VAR_OR_VARARG');
}

if (defined('T_AMPERSAND_NOT_FOLLOWED_BY_VAR_OR_VARARG') === false) {
define('T_AMPERSAND_NOT_FOLLOWED_BY_VAR_OR_VARARG', 'PHPCS_T_AMPERSAND_NOT_FOLLOWED_BY_VAR_OR_VARARG');
}

// Tokens used for parsing doc blocks.
define('T_DOC_COMMENT_STAR', 'PHPCS_T_DOC_COMMENT_STAR');
define('T_DOC_COMMENT_WHITESPACE', 'PHPCS_T_DOC_COMMENT_WHITESPACE');
Expand Down

0 comments on commit 7c5b4e6

Please sign in to comment.