Transliteration in PHP 5.4

February 1, 2012

In the third edition of my “PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide” book, titled “[intlink id=”1582″ type=”page”]PHP 6 and MySQL 5 for Dynamic Web Sites: Visual QuickPro Guide[/intlink]”, I went out on a limb and used a beta version of PHP 6 when writing the book. PHP 6 was about half-way done at the time, and I didn’t want to complete the book, only to have it be outdated immediately thereafter (using PHP 6 wasn’t, by the way, an attempt to trick the reader into buying the book, as some cynical people have suggested). Well…[intlink id=”1067″ type=”post”]PHP 6 ended up dying due to many complications[/intlink] and I had the proverbial egg on my face (what one reader rightfully called my “Dewey Defeats Truman” moment). In truth, only about 5% of the book or so required PHP 6, so it wasn’t a devastating mistake, but I certainly felt foolish.

I had specifically wanted to discuss PHP 6 because of its intended support for Unicode, which is what the code in the book requires for a couple of examples. Even though PHP 6 was shelved, the key components have since been integrated into PHP 5.2, 5.3, and the forthcoming 5.4. Transliteration, the ability to convert text from one alphabet to another, was demonstrated in the book using the PHP 6 str_transliterate() function. That function went belly-up, and PHP 5.4 now has the Transliterator class instead. The documentation for the class in non-existent, but here’s what I figured out…

As with many things in PHP, you can use the Transliterator class as an object or procedurally. Let’s look at the procedural approach first, which is what the book also does. The function that does all the work is transliterator_transliterate(). Its first argument is a either a string identifying the transliteration to conduct, or a Transliterator object. Its second argument is the text to be transliterated.

In the Transliterator class, transliterators are defined using the syntax fromto. So Bengali-Tamil will transliterate from the Bengali alphabet to the Tamil alphabet. Keep in mind this is just the replacing of characters from one alphabet to the corresponding characters in another. This is not translation!

To get the list of possible transliterators, invoke the transliterator_list_ids() method (Figure 1):

echo '<pre>' . print_r(transliterator_list_ids(), 1) . '</pre>';

Figure 1

Returning to the code in the book, in Script 14.4, trans.php, my name using the Latin alphabet was stored in a variable called $me. Then, an array of destination alphabets was created:

$me = 'Larry Ullman';
$scripts = array('Greek', 'Cyrillic', 'Hebrew', 'Arabic', 'Hangul');

Next, a for loop iterated through the array. Within the array, originally, the str_transliterate() function was called:

foreach ($scripts as $script) {
    echo "$me is " . str_transliterate($me, 'Latin', $script) . " in $script.\n";
}

With the updated Transliterator class, the proper syntax is now (Figure 2):

echo "$me is " . transliterator_transliterate ("Latin-$script", $me) . " in $script.\n";

And that’s all there is to it! (To reiterate, this does require PHP 5.4 or greater.)

Figure 2

To do the same thing using object-oriented programming, you’d first create a new Transliterator object:

$t = Transliterator::create("Latin-$script");

Then you call the transliterate() method of the object, providing the text to transliterate as the first argument:

$t->transliterate($me);

And there you have it!

The Transliterator class can be told to transliterate forwards, or in reverse, allowing you to go from an alphabet written in one language to an alphabet written in another.