Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process inner html of blocks when escaping text content #719

Open
wants to merge 11 commits into
base: trunk
Choose a base branch
from
66 changes: 66 additions & 0 deletions includes/create-theme/theme-locale.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,72 @@ private static function escape_text_content( $string ) {

$string = addcslashes( $string, "'" );

jffng marked this conversation as resolved.
Show resolved Hide resolved
// Process the string to avoid escaping inner HTML markup.
$p = new WP_HTML_Tag_Processor( $string );

$text = '';
$tokens = array();
while ( $p->next_token() ) {
$token_type = $p->get_token_type();
$token_name = strtolower( $p->get_token_name() );
$is_tag_closer = $p->is_tag_closer();

if ( '#tag' === $token_type ) {
// Add a placeholder for the token.
$text .= '%s';
if ( $is_tag_closer ) {
$tokens[] = "</{$token_name}>";
} else {
// Depending on the HTML tag, we may need to process attributes so they are correctly added to the placeholder.
switch ( $token_name ) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I would like feedback on is whether this is too fragile, complicated, or tedious an approach to maintain.

I plan to add a few more unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to rely on the output of get_token_name(), so based on that, I believe this is a fine way to parse these tags. We have a limited scope since we're currently only dealing with a, img and mark, and otherwise we default to the original tag.

That's only my opinion though 😅 Was there anything you were specifically concerned about with this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just needing to add cases for specific tags and if the attributes changes, it may break.

But the options for adding inline HTML / formatting inside blocks are rather limited, so maybe it's okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to deconstruct the attributes and reassemble them? I'm worried about dropping attributes or other parts of the markup that might be added by plugins or filters.

For example, could we replace <a href="https://wordpress.org">WordPress</a> with something like

<a href="<?php echo esc_url( 'https://wordpress.org' ) >?"><?php echo esc_html__( 'WordPress' ) ?></a>

but maintain any other attributes added to the <a> tag?

I'm not really worried about escaping every possible attribute, I think the main thing is to make sure the translated strings are escaped as a form of user input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about dropping attributes or other parts of the markup that might be added by plugins or filters.

That's a good point. I could not find a way with the HTML Tag Processor to just carry over the whole tag token and all its attributes. But I refactored the approach so that all the attributes should be carried over, rather than processing the attributes based on the type of tag: 678a8ef#diff-fc11ad66397ed68725a8fe74f3031f104368f850b5b94370123482ec9e719e1bR54-R70

// Handle links.
case 'a':
$href = esc_url( $p->get_attribute( 'href' ) );
$target = empty( esc_attr( $p->get_attribute( 'target' ) ) ) ? '' : ' target="_blank"';
$rel = empty( esc_attr( $p->get_attribute( 'rel' ) ) ) ? '' : ' rel="nofollow noopener noreferrer"';
$tokens[] = "<a href=\"{$href}\"{$target}{$rel}>";
break;
// Handle inline images.
case 'img':
$src = esc_url( $p->get_attribute( 'src' ) );
$style = esc_attr( $p->get_attribute( 'style' ) );
$alt = esc_attr( $p->get_attribute( 'alt' ) );

CBT_Theme_Media::add_media_to_local( array( $src ) );
$relative_src = CBT_Theme_Media::get_media_folder_path_from_url( $src ) . basename( $src );
$tokens[] = "<img style=\"{$style}\" src=\"' . esc_url( get_stylesheet_directory_uri() ) . '{$relative_src}\" alt=\"{$alt}\">";
break;
// Handle highlights.
case 'mark':
$style = esc_attr( $p->get_attribute( 'style' ) );
$class = esc_attr( $p->get_attribute( 'class' ) );
$tokens[] = "<mark style=\"{$style}\" class=\"{$class}\">";
break;
jffng marked this conversation as resolved.
Show resolved Hide resolved
// Otherwise, just add the tag opener.
default:
$tokens[] = "<{$token_name}>";
break;
}
}
} else {
// If it's not a tag, just add the text content.
$text .= esc_html( $p->get_modifiable_text() );
}
}
// If tokens is not empty, format the string using sprintf.
if ( ! empty( $tokens ) ) {
// Format the string, replacing the placeholders with the formatted tokens.
return "<?php /* Translators: %s are html tags */ echo sprintf( esc_html__( '$text', '" . wp_get_theme()->get( 'TextDomain' ) . "' ), " . implode(
jffng marked this conversation as resolved.
Show resolved Hide resolved
', ',
array_map(
function( $token ) {
return "'$token'";
},
$tokens
)
) . ' ); ?>';
}
creativecoder marked this conversation as resolved.
Show resolved Hide resolved

return "<?php esc_html_e('" . $string . "', '" . wp_get_theme()->get( 'TextDomain' ) . "');?>";
}

Expand Down
7 changes: 4 additions & 3 deletions tests/CbtThemeLocale/escapeTextContent.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,10 @@ public function test_escape_text_content_with_double_quote() {
}

public function test_escape_text_content_with_html() {
$string = '<p>This is a test text with HTML.</p>';
$escaped_string = $this->call_private_method( 'escape_text_content', array( $string ) );
$this->assertEquals( "<?php esc_html_e('<p>This is a test text with HTML.</p>', 'test-locale-theme');?>", $escaped_string );
$string = '<p>This is a test text with HTML.</p>';
$escaped_string = $this->call_private_method( 'escape_text_content', array( $string ) );
$expected_output = "<?php /* Translators: %s are html tags */ echo sprintf( esc_html__( '%sThis is a test text with HTML.%s', 'test-locale-theme' ), '<p>', '</p>' ); ?>";
$this->assertEquals( $expected_output, $escaped_string );
}

public function test_escape_text_content_with_already_escaped_string() {
Expand Down
2 changes: 1 addition & 1 deletion tests/CbtThemeLocale/escapeTextContentOfBlocks.php
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ public function data_test_escape_text_content_of_blocks() {
<!-- /wp:verse -->',
'expected_markup' =>
'<!-- wp:verse {"style":{"layout":{"selfStretch":"fit","flexSize":null}}} -->
<pre class="wp-block-verse"><?php esc_html_e(\'Ya somos el olvido que seremos.<br>El polvo elemental que nos ignora<br>y que fue el rojo Adán y que es ahora<br>todos los hombres, y que no veremos.\', \'test-locale-theme\');?></pre>
<pre class="wp-block-verse"><?php /* Translators: %s are html tags */ echo sprintf( esc_html__( \'Ya somos el olvido que seremos.%sEl polvo elemental que nos ignora%sy que fue el rojo Adán y que es ahora%stodos los hombres, y que no veremos.\', \'test-locale-theme\' ), \'<br>\', \'<br>\', \'<br>\' ); ?></pre>
<!-- /wp:verse -->',
),

Expand Down
15 changes: 10 additions & 5 deletions tests/test-theme-templates.php
Original file line number Diff line number Diff line change
Expand Up @@ -159,12 +159,12 @@ public function test_properly_encode_lessthan_and_greaterthan() {

public function test_properly_encode_html_markup() {
$template = new stdClass();
$template->content = '<!-- wp:paragraph -->
<p><strong>Bold</strong> text has feelings &lt;&gt; TOO</p>
<!-- /wp:paragraph -->';
$template->content = '<!-- wp:paragraph --><p><strong>Bold</strong> text has feelings &lt;&gt; TOO</p><!-- /wp:paragraph -->';
$escaped_template = CBT_Theme_Templates::escape_text_in_template( $template );

$this->assertStringContainsString( "<?php esc_html_e('<strong>Bold</strong> text has feelings &lt;&gt; TOO', '');?>", $escaped_template->content );
$expected_output = '<!-- wp:paragraph --><p><?php /* Translators: %s are html tags */ echo sprintf( esc_html__( \'%sBold%s text has feelings &lt;&gt; TOO\', \'\' ), \'<strong>\', \'</strong>\' ); ?></p><!-- /wp:paragraph -->';

$this->assertStringContainsString( $expected_output, $escaped_template->content );
}

public function test_empty_alt_text_is_not_localized() {
Expand Down Expand Up @@ -262,7 +262,12 @@ public function test_localize_verse() {
<pre class="wp-block-verse">Here is some <strong>verse</strong> to localize</pre>
<!-- /wp:verse -->';
$new_template = CBT_Theme_Templates::escape_text_in_template( $template );
$this->assertStringContainsString( "<?php esc_html_e('Here is some <strong>verse</strong> to localize', '');?>", $new_template->content );

$expected_output = '<!-- wp:verse -->
<pre class="wp-block-verse"><?php /* Translators: %s are html tags */ echo sprintf( esc_html__( \'Here is some %sverse%s to localize\', \'\' ), \'<strong>\', \'</strong>\' ); ?></pre>
<!-- /wp:verse -->';

$this->assertStringContainsString( $expected_output, $new_template->content );
}
jffng marked this conversation as resolved.
Show resolved Hide resolved

public function test_localize_table() {
Expand Down
Loading