Translating text that contains HTML

Thursday, September 22, 2011

If you're not passing all of your page titles, meta descriptions, img alt tags, form labels, error messages, etc. through a translate function right now, you may be in for a very long project the day that you need to provide translation ability for the hundreds of templates you've built. I speak from experience, as I recently spent about three full days finding every raw text element in my framework's templates that needed to be translated for a multi-lingual site, and wrapping the text chunks in a translate function. The earlier you start planning for this, the better off you are going to be.

But I don't want to spend the time developing some giant translation system you're probably thinking. That's fine, you don't have to - you just have to plan for it with an implementation as simple as this:

  1. <?php
  2. class Translate
  3. {
  4. public function t($term)
  5. {
  6. return $term;
  7. }
  8. }

Next, find all of the text chunks in the templates and wrap them with the t() method, e.g.

  1. <h2><?=$translate->t('Login Here!'); ?></h2>
  2. <form name="awesomeLoginForm" action="">
  3. <label for="username"><?=$translate->t('Username'); ?></label>
  4. <input type="text" id="username" name="username" />
  5. <label for="password"><?=$translate->t('Password'); ?></label>
  6. <input type="text" id="password" name="password" />
  7. <input type="submit" value="<?=$translate->t('Go'); ?>" />
  8. </form>

Right now, this may seem like a pointless exercise, however imagine that you've done this throughout your site templates, and now you need to translate your site into Spanish. All you have to do is expand on your simple Translate class to replace the English phrases with Spanish ones, which can be as simple as calling str_replace() on a couple of translation arrays. Of course it's usually more complicated than this, but you'll definitely have a large head start if you being implementing this as soon as possible.

In the example above, we are just translating plain text phrases. What happens if you have a chunk of html (for example, marked up HTML generated from a WYSIWYG editor like TinyMCE? I faced this exact problem a few months ago, and came up with a solution that uses PHP's built-in DOMDocument class to recursively iterate through all the DOM nodes of the HTML, and apply the $translate->t() method to all the text elements, submit and button values, and img alt tags.

The translateHTML() Function

  1. <?php
  2. function translateHTML($html='', Translate $translate, $node=null)
  3. {
  4. if (is_null($node)) {
  5. $node = new DOMDocument;
  6. $node->loadHTML($html);
  7. }
  8. if ($node->hasChildNodes() && $node->childNodes->length > 1) {
  9. foreach ($node->childNodes as $childNode) {
  10. translateHTML('', $translate, $childNode);
  11. }
  12. } else {
  13. $thisNode = ($node->hasChildNodes()) ? $node->firstChild : $node;
  14. if ($thisNode->nodeName == 'body') {
  15. translateHTML('', $translate, $thisNode);
  16. } else {
  17. if ($thisNode->nodeName == '#text') {
  18. if (preg_match('#[a-z]{2,}#i', $thisNode->nodeValue)) {
  19. $trimmed = trim($thisNode->nodeValue);
  20. $thisNode->nodeValue = str_replace(
  21. $trimmed,
  22. $translate->t($trimmed),
  23. $thisNode->nodeValue
  24. );
  25. }
  26. } elseif ( $thisNode->nodeName == 'input'
  27. && (
  28. $thisNode->getAttribute('type') == 'submit'
  29. || $thisNode->getAttribute('type') == 'button'
  30. )
  31. ) {
  32. $translated = $translate->t($thisNode->getAttribute('value'));
  33. $thisNode->setAttribute('value', $translated);
  34. } elseif ( $thisNode->nodeName == 'img'
  35. && $thisNode->getAttribute('alt') != ''
  36. ) {
  37. $translated = $translate->t($thisNode->getAttribute('alt'));
  38. $thisNode->setAttribute('alt', $translated);
  39. }
  40. }
  41. }
  42. if ($node instanceof DOMDocument) {
  43. $html = $node->saveXML($node->documentElement);
  44. return preg_replace('#</?(html|body)>#', '', $html);
  45. }
  46. }

You could modify this to do all sorts of cool things with different html elements, but focusing on the current task, here is an example implementation and the ouput.

Example

  1. <?php
  2. class Translate
  3. {
  4. public function t($term)
  5. {
  6. return '{{{'.$term.'}}}';
  7. }
  8. }
  9. $html = '
  10. <h1>Example Page</h1>
  11. <p>
  12. Lorem ipsum dolor sit amet, consectetur adipisicing elit,
  13. sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  14. </p>
  15. <img src="logo.jpeg" alt="Amazing Logo" />
  16. <form name="test" action="">
  17. <input type="text" name="foo" />
  18. <input type="submit" value="Do It!" />
  19. </form>
  20. ';
  21. echo translateHTML($html, new Translate);
This will output the following HTML:
  1. <h1>{{{Example Page}}}</h1>
  2. <p>
  3. {{{Lorem ipsum dolor sit amet, consectetur adipisicing elit,
  4. sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.}}}
  5. </p>
  6. <img src="logo.jpeg" alt="{{{Amazing Logo}}}" />
  7. <form name="test" action="">
  8. <input type="text" name="foo" />
  9. <input type="submit" value="{{{Do It!}}}" />
  10. </form>

Posted by Aaron Fisher at 4:22pm

0 Comments RSS

Login To Post A Comment

Don't have an account yet? Sign Up