Catégorie: "Internationalisation"

Charset conversions (i18n)

Yesterday, I came accross this interesting table which lets me know what conversions I need to do when I paste text from Word into a textarea and further want to use this text on the web...

To be accurate, this table is useful for conversion from the default windows charset (windows-1252 aka CP1252) to the default web charset (ISO-8859-1 aka Latin-1). Nethertheless, this allowed me to check the conversion in my b2evolution software and I noticed that it was missing one conversion (in a total of 27).

Anyway, the world actually extends way beyond cp1252 and Latin-1, so how would one deal with other languages? :?:

For example, how do I convert Latvian from Windows-1257 to iso-8859-13 (close match) ? Or Russian from Koi8-r to iso-8859-5 (funky match) ? Check out this awesome character set database provided by the Institute of the Estonian Language. (Wouldn't it make sense if unicode.org provided this? :crazy:)

By the way, how do I know what charsets are to be used for a particular language? Here's a page by the W3C, but it's a little sparse... Another one.

Survival guide to i18n

Has an interesting conversion table from win-1252 to Unicode.

Internationalizing web applications using gettext in PHP

As I have said before, gettext is a very interesting framework for i18n and i10n.

Now the question is, how do I apply this to web applications? Actually, I'm going to restrict my discussion here to PHP since this is what I'm working with right now... but you should expect similar behaviour when using other web development tools that integrate gettext.

First of all, the good news: PHP fully supports gettext since version 3.0.7. So it's been used for a long time and you can even find tutorials on the net.

PHP/Gettext in action

Full story »

Introducing gettext and .PO files

poedit screenshot
As I said recently, i18n and l10n are best carried out using the right tools...

I've looked around somewhat and it turns out there seems to be an absolute reference in the area: the GNU gettext framework.

This framework actually comprehends several things:

  • A set of conventions about how programs should be written to support i18n;
  • A directory and file naming organization for the translated strings;
  • A runtime library to display localized text;
  • A set of utilities to handle the l10n process;
  • A special mode for Emacs which helps preparing the sources for i18n.

Full story »