Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: Implement a solution for interpolation of React elements in strings #9846

Closed
aduth opened this issue Sep 12, 2018 · 23 comments
Closed
Assignees
Labels
Internationalization (i18n) Issues or PRs related to internationalization efforts [Type] Task Issues or PRs that have been broken down into an individual action to take

Comments

@aduth
Copy link
Member

aduth commented Sep 12, 2018

Problem: Most any sprintf implementation, including that of @wordpress/i18n, is expected to return a string value. In a React application, this can be problematic if the string contains HTML markup, as the HTML will be escaped, at least without using dangerouslySetInnerHTML, which is discouraged. This can sometimes lead to developers implementing workarounds using concatenation of partial localized strings (e.g. #5767 (comment)), which is not advisable as it cannot be translated accurately in many languages. This is also noted in the i18n for WordPress Developers guide:

Use format strings instead of string concatenation—sprintf(__('Replace %1$s with %2$s'), $a, $b); is always better than __('Replace ').$a.__(' with ').$b; .

https://codex.wordpress.org/I18n_for_WordPress_Developers#Best_Practices

Task: Find a way to allow for React elements to be inserted within localized strings safely.

Prior Art:

Additional Considerations:

  • We still need to be able to extract the string which would be surfaced to translators, presumably including placeholders where a JSXExpressionContainer occurs
  • There should be a reasonably-ergonomic offering for those developers who choose not to use Babel tooling

Brainstorming:

It would be nice if a developer could simply wrap their string and elements in a component which performs the localization:

<Interpolate>
	Check out this link to my <a href={ url }>website</a>.
</Interpolate>

There are, of course, complications:

  • How do we extract the string?
    • Presumably it would take the form Check out this link to my <a href="%s">website</a>. We'd need to find a way to associate the JSX expression to the placeholder (and vice-versa, inject the variable value where the placeholder occurs when rendering into the UI).
  • Pluralization / Context
  • Nested custom component elements
    • For example, what if the website link was a <WebsiteLink /> element?
  • Encouraging best practices to avoid extracted strings being indecipherable

It might be simpler to associate placeholders via a known limited subset of expression values allowed. This could also be used for linting enforcement of best-practices:

<Interpolate>
	{ ( getPlaceholder ) => (
		<Fragment>Check out this link to my <a href={ getPlaceholder( '%s' ) }>website</a>.</Fragment>
	) }
</Interpolate>

Aside: It is be feasible for a custom Babel plugin to transform an element like as written in the first snippet (which has arguably nicer semantics) to this latter format.

To address pluralization, perhaps having a special child component type mapping to the index of the plural form corresponding to the count (an advantage here is supporting >2 plural forms (correction below)):

<Interpolate count={ 2 }>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.PluralForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

cc @mcsf @jorgefilipecosta @swissspidy

@aduth aduth added [Type] Task Issues or PRs that have been broken down into an individual action to take Internationalization (i18n) Issues or PRs related to internationalization efforts labels Sep 12, 2018
@mcsf
Copy link
Contributor

mcsf commented Sep 13, 2018

We've talked about this in private, but noting that auditing our current strings would help us prioritize this task.

@ockham
Copy link
Contributor

ockham commented Sep 18, 2018

It would be nice if a developer could simply wrap their string and elements in a component which performs the localization:

<Interpolate>
	Check out this link to my <a href={ url }>website</a>.
</Interpolate>

Agree 👍

I've also been thinking about a syntax like this for a while. Another benefit is that we could connect an Interpolate component to React context (for on-the-fly locale switching at last). (This however raises a question about how to deal with strings outside components.)

There are, of course, complications:

  • How do we extract the string?

    • Presumably it would take the form Check out this link to my <a href="%s">website</a>. We'd need to find a way to associate the JSX expression to the placeholder (and vice-versa, inject the variable value where the placeholder occurs when rendering into the UI).
  • Pluralization / Context

  • Nested custom component elements

    • For example, what if the website link was a <WebsiteLink /> element?
  • Encouraging best practices to avoid extracted strings being indecipherable

Shouldn't a regular JSX parser (as used by Babel and friends) be helpful here (and thus be feasible by a Babel plugin)? In particular, the task at hand shares some traits with shallow rendering, as provided by Enzyme or react-test-renderer, doesn't it? Would parsing the "top level" (no components expansion) be sufficient here, followed by concatenation of all text nodes' strings, with placeholders inserted for non-text nodes?

It might be simpler to associate placeholders via a known limited subset of expression values allowed. This could also be used for linting enforcement of best-practices:

<Interpolate>
	{ ( getPlaceholder ) => (
		<Fragment>Check out this link to my <a href={ getPlaceholder( '%s' ) }>website</a>.</Fragment>
	) }
</Interpolate>

Aside: It is be feasible for a custom Babel plugin to transform an element like as written in the first snippet (which has arguably nicer semantics) to this latter format.

That's reassuring, since the render prop syntax isn't exactly nice 😆

To address pluralization, perhaps having a special child component type mapping to the index of the plural form corresponding to the count (an advantage here is supporting >2 plural forms):

<Interpolate count={ 2 }>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.PluralForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

Bit verbose 🙁 but yeah, might be the tersest we can do.

@nerrad
Copy link
Contributor

nerrad commented Sep 18, 2018

This is a bit confusing for me:

<Interpolate count={ 2 }>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.PluralForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

I'm likely just misunderstanding things, but something like this might be a bit clearer?

<Interpolate count={ 2 }>
	<Interpolate.SingularForm>
		Check out this link to my <a href={ url }>website</a>.
	</Interpolate.SingularForm>
	<Interpolate.PluralForm>
		Check out this link to my <a href={ url }>websites</a>.
	</Interpolate.PluralForm>
</Interpolate>

@aduth
Copy link
Member Author

aduth commented Sep 18, 2018

@nerrad I think this may have been a misunderstanding on my part around >2 plural forms. I believe you're correct that, at least as how we express in English, there would be only the singular and plural form to include in the code (corresponding to msgid and msgid_plural in gettext).

References:

@swissspidy
Copy link
Member

Yeah just singular + plural in the code needed..

@nerrad
Copy link
Contributor

nerrad commented Sep 18, 2018

I think we should also shoot for a goal where there is no html in the translation string (how feasible that is I don't know yet). Otherwise the string translation is inherently fragile.

@aduth
Copy link
Member Author

aduth commented Sep 18, 2018

The tools for including markup in localized string exist in WordPress PHP, so it's assumed there will be a similar expectation in JavaScript.

Our usage in Gutenberg thus far has been extremely limited, which is why I've personally temporarily shelved this as an issue.

As one data point, @blowery had done a brief audit of component interpolation in Calypso and had found at least 800+ uses in the codebase. Embedding a, em, and strong tags are the most common amongst these.

@iandunn
Copy link
Member

iandunn commented May 2, 2019

More prior art: WordPress/wordcamp.org@bbe3ce8

cc @coreymckrill

@coreymckrill
Copy link
Contributor

@tn3rb
Copy link

tn3rb commented May 2, 2019

re: singular && plural

please keep in mind that some languages use different rules and forms for pluralization, such as:

  • a rule for 0 things
  • a rule for 1 thing
  • a rule for 2 things
  • a rule for 3 things
  • a rule for everything else

Here's a page that discusses this: https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

An example from that list is for Icelandic and Macedonian where there is one rule for any number that ends in 1, except for 11 (1, 21, 31, 41, 51, 61...) and another rule for everything else (0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...).

So a simple singular vs plural system is not going to work.

Ultimately it would be better to simply use the count to pick the corresponding translation and then have a default to cover all other cases.

I can't give a good real world example because I'm a dumb one language kinda guy, so please just use your imagination and pretend that these were valid in something other than english

<Interpolate count={ 2 }>
	<PluralRule count={ 0 }>
		no things for you
	</PluralRule>
	<PluralRule count={ 1 }>
		there is one thing
	</PluralRule>
	<PluralRule count={ 2 }>
		(in Yoda voice) two things there are
	</PluralRule>
	<PluralRule>
		there are %s things
	</PluralRule>
</Interpolate>

For the Icelandic and Macedonian example from the MDN docs, the Interpolate component would have to utilize some kind of rule callback to decide what translation to return.

(EDIT)
my apologies, I missed that >2 plural forms had been mentioned

@swissspidy
Copy link
Member

@iandunn @coreymckrill Very interesting, thanks for sharing!

Sounds like we should be able to provide a function like arrayTokenReplace to developers, and do the tokenSplit part automatically behind the scenes. In this case, usage would not be much different from sprintf() (yay for developers) but still allow using HTML for placeholder substitution.

@aduth What do you think of that? Would this something that could be even added to Tannin directly?

@aduth
Copy link
Member Author

aduth commented May 3, 2019

Incoming stream of consciousness:

I like that the approach is not specifically tailored to the React, but rather to the generic problem of implementing substitution of non-string values. It does impose some challenge on the implementer to handle an array of parts; it just-so-happens to work pretty well† out of the box as a React children value, though if the goal is to be more generic, it's not immediately obvious what this might look like in other settings.

† Pretty well: React may complain about missing keys, which might need to be included as part of the abstraction.

Sounds like we should be able to provide a function like arrayTokenReplace to developers, and do the tokenSplit part automatically behind the scenes. [...] Would this something that could be even added to Tannin directly?

I'm not clear what's proposed to be included where. Functionally this sounds similar to a utility function like sprintf, where the return value may be an array instead of a string. In which case, something like tokenSplit is a common lower-level function, one which may be reasonable to include in tannin, or certainly @wordpress/i18n.

At which point, you could have (pending beautification):

sprintf = ( str, ...args ) => tokenSplit( str ).reduce( ( result, part, i ) => result + ( i % 2 ? args[ ( i - 1 ) / 2 ] : part ), '' );
Interpolate = ( props ) => tokenSplit( Children.toArray( props.children ).map( ( child, i ) => typeof child === 'string' ? child : '%' + i + '$s' ).join( '' ) ).reduce( ( result, part, i ) => [ ...result, ( i % 2 ? cloneElement( Children.toArray( props.children )[ ( i - 1 ) / 2 ], { key: i } ) : part ) ], [] );

Two things I note:

  • It requires some abuse of types, likely in using "string" type for elements
  • While this satisfies the runtime implementation, it still requires enhancements to string extraction to handle "Interpolate".

At this point, I re-read your comment and perhaps you're suggesting that this be put more on the developer to handle? So in their own renders, e.g.

function Accusation( props ) {
	const { who, where, weapon } = props;

	return <div>{ eprintf( __( 'I accuse %1$s in the %2$s with the %3$s!' ), <em>{ who }</em>, <em>{ where }</em>, <em>{ weapon }</em> ) }</div>;
}

The benefit here being that no changes are required to string extraction.

For both this and Interpolate above, I'm not sure we're doing much to help solve the problem of allowing for markup in an extracted string. As in the original comment:

Check out this link to my <a href="%s">website</a>.

As considered, it only works if we substitute the entirety of the link:

Check out this link to my %1$s.

...but then we lose translation (or at least context of the translation of) "website".

While I'm having difficulty coming up with a solution, I could imagine one or a sum of parts of:

  • Allowing as part of considering placeholders or token splitting: Different "types" which cover the opening and closing of an element, where the React implementation would manage recombination into an element.
  • Construct the markup for the generated string at build time (e.g. with a Babel transform) to "trick" (read: be compatible with) extractors
    • e.g. __( 'Check out this link to my <a href="%s">website</a>.' ) would appear to exist in the built code (not the source code), and somehow at runtime it's instead considered more like as if it were 'Check out this link to my %1$s.' or, as in the example with Interpolate, more like a proper element <Fragment>Check out this link to my <a href={ url }>website</a></Fragment>.

I think there may be iterative improvements with element substitutions we could find with something like the utilities mentioned in #9846 (comment) without needing to solve all the problems, but I do think we should have some general idea of what a broader solution might look like to have a better sense of how it all fits together.

@nerrad nerrad self-assigned this Jun 12, 2019
@nerrad
Copy link
Contributor

nerrad commented Jun 16, 2019

I've been thinking/experimenting on this a bit over the weekend and I have some more thoughts to throw up here.

So one thing I'm thinking of playing with is introducing new tokens to represent open/close type of elements. So for example, open would be %1$o and close would be %1$c. In a string it might something like this (note 1 here refers to what argument provides the element to use surrounding the content):

eprintf( __( 'Check out this link to my %1$owebsite%1$c' ), 'a'  );

Behind the scenes, the code would take care of creating the element attached to the token (in this case 'a') and adding the content between the o and c for that argument number as the child on the element. So the children array for the above might end up with something like this:

const children = [
    wp.createElement( '', { key: 0 }, 'Check out this link to my ' ),
    wp.createElement( 'a', { key: 1 }, 'website' ),
];

Some benefits:

  • String extraction will still work
  • It allows for keeping translation of content inside the element in context.
  • It's a somewhat familiar api for developers to use (albeit there's still some syntax idiosyncrasies that would need communicated).
  • We don't need to whitelist element types (you could feasibly have translated content as the child of any component).

As an option, we could detect if the argument is an object and use that to provide not only the element but also the props to pass to the element on creation. So it might look something like this:

eprintf( 
    __( 'Check out this link to my %1$owebsite%1$c' ),
    { el: 'a', props: { href: 'https://website.com' } }
);

The main question I have here though is how would translators handle the introduction of what on the surface seems unfamiliar tokens. @swissspidy any thoughts on that?

@swissspidy
Copy link
Member

Ideally we‘d avoid using such placeholders for HTML tags. It‘s hard to grasp for translators for that use case and very easy to make errors.

Currently on mobile so I don‘t have resources at hand to point to, but I‘d say from a localization perspective it‘s the last resort IMO.

Perhaps @ocean90 can give more feedback from the meta side as well.

@nerrad
Copy link
Contributor

nerrad commented Jun 17, 2019

I can definitely see that as a potential downside of it being easier for translators to make errors. But imo that's not a strong argument because any usage of tokens introduces the chance of human errors made by translators. I do agree though that there should be a greater awareness of what an html tag is.

However, out of the options presented, this is the one that gets us closest to a solution for the immediate problem at hand with minimal extra modifications made elsewhere. As far as I can tell, other solutions (such as what was proposed by @aduth):

  • require whitelisting a set of html tags that can be embedded in translated strings (either as special tokens or parsed content)
  • still need worked out how props (or attributes) on the embedded tags would be parsed from the content.
  • likely need additional work done for either extracting the translation string (in wp-cli or the wp.i18n pot-to-php script) or work done in wp.i18n to expose the string for translation in the expected format, but then convert the translated value to safe values in react (which doesn't eliminate the chance for human error if for example a translator doesn't preserve the html tag exactly as it was in the original string).

@aduth
Copy link
Member Author

aduth commented Jun 25, 2019

@swissspidy and I chatted briefly about this at WordCamp Europe this past weekend. @nerrad , to your third point, I think the original proposal here had some nice advantages from the perspective of usability, though certainly is much more technically challenging to implement.

Your proposed solution reminds me quite a bit of the Interpolate-Components prior art I shared in the original comment, albeit using the traditional placeholder syntax instead of a custom {{ / }} placeholder. Technically it's still custom since you're introducing new o and c types for printf (though, in checking, they both already have defined meanings as octal and character types respectively).

@nerrad
Copy link
Contributor

nerrad commented Jun 29, 2019

Your proposed solution reminds me quite a bit of the Interpolate-Components prior art I shared in the original comment, albeit using the traditional placeholder syntax instead of a custom {{ / }} placeholder.

Yes definitely inspired by that, but with the tweak of keeping syntax that would be at least somewhat familiar to translators. However, it'd be trickier doing nested replacement than if Interpolate-components was used. Also, as you pointed out:

(though, in checking, they both already have defined meanings as octal and character types respectively).

Which I think nixes the idea (I though I had checked if they already had defined meanings, but I obviously didn't :) ).

@nerrad
Copy link
Contributor

nerrad commented Jun 29, 2019

I like what the react.i18next.com does with the <Trans /> component (see here. We could implement something similar as what was suggested in the original request here, however even there, notice what the translation strings end up as:

"userMessagesUnread": "Hello <1>{{name}}</1>, you have {{count}} unread message. <5>Go to message</5>.",
"userMessagesUnread_plural": "Hello <1>{{name}}</1>, you have {{count}} unread messages.  <5>Go to messages</5>.",

(although they do allow for subbing in text representations of common white-listed elements provided they have no additional attributes).

I'm beginning to think that the goal of ending up with strings presented to translators like this:

Check out this link to my <a href="%s">website</a>.

Is going to require using dangerouslySetInnerHtml at some point of the process. Whereas if we had something like this for translators:

Check out this link to my {{link1}}website{{/link1}} and this {{link2}}other link{{/link2}}.

It preserves some context for translators to grasp but still should allow developers to use an interface that is relatively simple.

So basically, the issue here is will introducing a new placeholder format in translation strings be a no-go for translations in the wp ecosystem, or is it an acceptable change to limit the complexity of implementation?

@nerrad
Copy link
Contributor

nerrad commented Jun 29, 2019

Should note however, I'm still working on an experiment implementing the <Interpolate/> idea.

@nerrad
Copy link
Contributor

nerrad commented Jul 1, 2019

Just an update, I should have something to share this coming week on this. Made some great progress over the weekend :).

@nerrad
Copy link
Contributor

nerrad commented Jul 1, 2019

The first experimental proposal for how we could move forward with this is now up: #16374

@akirk
Copy link
Member

akirk commented Sep 24, 2019

If I may give some context of what we've been doing with i18n-calypso and why I think it's a good approach:

By using {{htmlPlaceholder}}-style placeholders, we've encouraged developers to name the tags more verbosely which is helpful to translators. See this example (from wp-calypso):

By clicking {{strong}}%(saveButtonLabel)s{{/strong}}, you agree to the applicable {{draLink}}Domain Registration Agreement{{/draLink}} and confirm that the Transferee has agreed in writing to be bound by the same agreement. You authorize the respective registrar to act as your {{supportLink}}Designated Agent{{/supportLink}}.

This contains two <a href="url"> which are completely invisible to the translator. They know that there are two distinct links. It makes it easy to shuffle the HTML tags around. In my opinion they are easier to grasp than numbered placeholders.

It would be possible to achieve the same with <htmlPlaceholder>-style placeholders but I think this could be confusing in a case like this:

The number of posts to include in your site's feed. <link>Learn more about feeds</link>

This is understandable for someone versed in HTML but also confusing because it uses the <link> tag wrong, while the developer tried to convey that this is a link.

One additional fact that I like about {{ placeholders is that they are deliberately not HTML. Within the JSX context one is used to writing HTML not as a string, while when using < placeholders we would ask developers for mixing of contexts.

Last but not least, I think it's helpful to keep defining translatable text as a string. It makes whitespace handling very clear (translatable strings are hash lookups). What about leading whitespace, trailing whitespace, line breaks, indentation?

@aduth
Copy link
Member Author

aduth commented Nov 19, 2019

With the introduction of __experimentalCreateInterpolateElement in #17376, it is now possible to include React elements in the output result of a translated string.

As discussed in today's JavaScript chat (link requires registration), we can consider some usability improvements in future tasks. I have created one such task at #18614, proposing the introduction of a Translate component.

Separately, if there are specific changes we should make in Gutenberg to make use of interpolated elements in existing components, those should be opened as separate issues or pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internationalization (i18n) Issues or PRs related to internationalization efforts [Type] Task Issues or PRs that have been broken down into an individual action to take
Projects
None yet
Development

No branches or pull requests

9 participants