text

Safely Using Strings Containing Markup in React with DOMParser

For the Web Stories WordPress plugin I came up with a solution to parse strings containing markup in a React application by leveraging the DOMParser interface. This is especially useful when dealing with translations where you would want to avoid any string concatenation.

I’ve previously written quite a bit on JavaScript internationalization in WordPress 5.0+. However, one aspect I did not address at the time was how to use these new features with translations containing markup. That’s because it was simply not possible until recently, unless you would use a dangerous function like dangerouslySetInnerHTML. But since that would pose security risks, it is not advisable for use in this case.

Thankfully, a new createInterpolateElement function was introduced to Gutenberg late last year that solves the problem in a safe way. It does so by sanitizing the input string using a simple parser that removes any unwanted markup. Here’s an example:

import { __ } from '@wordpress/i18n';
import { createInterpolateElement } from '@wordpress/elementt';
import { CustomComponent } from '../custom-component.js';

const translatedString = createInterpolateElement(
  __( 'This is a <span>string</span> with a <a>link</a> and a self-closing <custom_component />.' ),
  {
    span: <span>,
    a: <a href="https://make.wordpress.org/"/>,
    custom_component: <CustomComponent />,
  }
);Code language: JavaScript (javascript)

Any tag in the translated string that is part of the map in the second argument will be replaced by that component. So span will be replaced by an actual span tag, for example. If the translation contains any other markup not in the map, let’s say a some <img onClick={doBadStuff()} />, it would simply be discarded. Awesome!

Now, as you can see from the example above, this utility function is part of the @wordpress/element package and not @wordpress/i18n, as one might have expected. But what if you don’t want to use the former, or perhaps can’t?

An Alternative to createInterpolateElement

To answer this question, I looked at the implementation of createInterpolateElement under the hood. It’s actually quite neat, but also a bit complex using a regex-based tokenizer. I wanted something simpler.

The requirements were straightforward:

  • It needs to be fast at parsing strings with some simple markup
  • It needs to be secure
  • It needs to work in modern browsers (no IE support)

My research quickly led me to the DOMParser interface, which allows parsing XML or HTML source code from a string into an HTMLDocument. It is supported by all major browsers. But does it also work for this use case? I was keen to find out!

From DOMParser to React

Specifically, I looked into using DOMParser.parseFromString() to parse a given string into an HTMLDocument, traverse through that document and create actual React elements (using React.createElement and React.cloneElement) for every found HTML element based on the provided map. Text elements could just be used as-is. This worked incredibly well from the get-go. Here’s an excerpt of the final code:

const node = new DOMParser().parseFromString(children, 'text/html').body
  .firstChild;

// Loops through the document and calls transformNode on each node.
transform(node, mapping).map((element, index) => (
  <Fragment key={index}>{element}</Fragment>
));

function transformNode(node, mapping = {}) {
  const { childNodes, localName, nodeType, textContent } = node;
  if (Node.TEXT_NODE === nodeType) {
    return textContent;
  }

  const children = node.hasChildNodes()
    ? [...childNodes].map((child) => transform(child, mapping))
    : null;

  if (localName in mapping) {
    return React.cloneElement(mapping[localName], null, children);
  }

  return React.createElement(localName, null, children);
}Code language: JavaScript (javascript)

You can find the full code including documentation and tests on GitHub.

A key difference to createInterpolateElement is that elements missing from the map won’t be simply discarded, but inserted without any props/attributes being set, mitigating any security risks. It also means that void elements such as <br> can be used in the translatable strings, which can come in handy at times.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *