What is intlize?
Intlize is a tool to add internationalization to your application.
There are such tools available; why another one?
Most notably, the GNU gettext suite and the catgets suite are used for internationalization. Both have advantages and disadvantages of
their own. Intlize intends to combine gettext's and catgets' advantages. It can optionally produce its own straight forward format,
optimized for both speed and size.
The gettext team assumes that a programmer wants to write code, and not care much about internationalization. As a result, the impact on
the source code is minimized. All the programmer has to do is marking which string is translatable. Marking a string is easy, e. g. replace
puts("Write something") with
puts( _("Write something"
Please refer to the gettext manual for details.
The string encapsulated in
_() in the above example serves three purposes:
- it is the string that the translator translates
- it is used as the default string returned if no translation for this string is found
- it provides a handle for finding the translated string
This implies that this string appears in the executable and in every translation file, which is not space economical. The messages are
searched by string comparison, which is not the fastest way. On the other hand, out-of-date translations work without modification, only
with some untranslated messages.
Gettexts libintl is LGPLed, which may limit its use in non-free programs, or add a dependency in that case.
Gettext is very convenient for both programmer and translator. Speed and size could be improved. The licence may limit its useability in
Marking a string with catgets is something like
catgets(catd, 12, 34, "Write something") . This is already clumsy
compared to gettext, but things are even worse. The indices - 12 and 34 in the above example - have to be unique throughout the package
being internationalized. It is the programmers task to ensure that.
However, it is very well possible to use the catgets interface, and many applications do. Catgets made its way into libc, thus not
establishing additional dependencies in C/C++ programs. Looking up two indices may be faster than looking up a string, but this is hard
to tell. The translatable string does not serve as a handle to the translated string, so it does not appear in the translation files,
reducing their size.
The binary format of the translations that the lookup is performed on is not defined. It may vary from compiler to compiler, even
from version to version of the same compiler. Fortunately the generator of those binary files is delivered with the compiler.
Even though the catgets interface is less convenient than gettext, it is part of the standard libraries of the compiler. The translation
files tend to be smaller than the ones used by gettext. The format of the binary translations is not defined.
When it comes to marking strings, intlize stays as close as possible to gettext. The synthax is actually
_("Write something", 0)
. The numerical value is needed, because intlize uses an index for translation table lookup. A string marked this way is recognized
by the gettext tool xgettext. Human readable translation files in intlize are therefore the same as in gettext. This may ease the work for
The index value is not allways 0, but has to be uniqe for every different string. It is intlizes task to ensure that. When adding a
marked string, the programmer allway writes 0, which is an invalid index. Intlize alters this index as needed.
Intlize and catgets
This is an obvious combination. Strings are marked the intlize way, beeing convenient to the programmer. These strings are extracted
with xgettext, producing the well known "po-files". Tools to write translations with such file format are widely available, in a large
variety. Intlize then produces a catgets catalog from the po-file. This catalog is still in human readable form, since catgets binary
format is not defined. This binary format is produced at compile time with the catgets tool gencat.
Intlize can optionally produce output in its own binary format. In this mode, the translatable string does not appear in the application
or any binary translation file. Intlize produces an additional translation file with name "C", that does not contain a translation, but the
translatable strings. This file is used whenever no suitable translation file is found.
Intlizes binary translation file format has minimal overhead. They are basically an array of C-strings. The index given to the marked
strings in the souces are contiguous and can directly map into that array. It is hard to imagine a faster translation string lookup.
Intlize binary format
|offset 0||8 byte||string||"intlize\0"||file magic|
|offset 8||1 byte||char||charcter encoding|
|offset 9||variable||string||version, zero terminated|
|2 byte||string||"?\0"||message # 0|
|variable||string||message # 1|
|variable||string||message # 2|
|variable||string||message # n|
The character encoding must not be 0.
An arbitrary zero terminated string to detect whether the versions of the package and the translation file match.