Localisation tools

nwhitfield · Jun 25, 2021

I thought I'd just write a bit about how I handle localisation across site and apps, in case other people find it interesting, and if there is sufficient interest, I might be able to put some of our tools on Github.

The main app I write is a client for a social network site that I run. That site is available in four languages at present: English, German, French and Spanish; to manage that we use the Smarty templating engine, which allows us to take most of the text out of the template files and put it in langauge config files. So, a template might say something like

HTML:

{config_load file='account.txt'}
<head>
    <title>{#pageTitle#}</title>
</head>

and an accompanying langauge file will contain a line that says

B4X:

pageTitle = Account settings

That works for one language, but Smarty also allows sections, denoted using square brackets, like in PHP, so the template can say

HTML:

{config_load file='account.txt' section=$language}

and then the language file can say

B4X:

[en]
pageTitle    = Account settings

[de]
pageTitle    = Kontoeinstellungen

This works well for the web, and when we add a new page, we can send the original version off to a translator and they can send me back a new version. However, especially using volunteers, that slows things down a bit, as I don't like to launch new features unless they're available in all our supported languages. We've been experimenting with machine translation, using DeepL.com which most people feel gives better results than Google.

So, I've now built a php script called deepler that will take the names of text files and run them through DeepL to create a multilingual version. So if I want quite translations, I just type

B4X:

php deepler.php account.txt

to generate the language file. It can then be revised by volunteers if there are any serious problems, but in the meantime we have something that should at least be passable.

And so to B4X. I looked at the Localizator tools, and they do the job, but personally I found the idea of using a spreadsheet unwieldy, and it would mean that for our volunteers, they'd have to do things differently depending on whether they were working on translations for the website, or for the apps. So, instead, I've build a script that will take the language files created by the deepler tool, and compile them into an sqlite database; the format is not quite the same as Erel's, but it would a simple fix to make it so.

So, I can create the text files for the app in exactly the same way, run them through the deepler script to get machine translations, and then compile them to a translations.db file with the command

B4X:

php compile-translations.php NAV_account.txt NAV_android.txt NAV_app_tour.txt

Since we're dealing with German, there are an awful lot of long words (it's good to have a 'less formal' option on DeepL, but I'd really love one for "use short words"), so I've also created a tool to help flag those up; it internally renders an image of the text using a TTF file and compares the length to the base langauge, which is set to English. So I can type

B4X:

php measure-texts.php NAV_tagopts.txt

and get back a result that flags up anything more than 10% longer than the English:

B4X:

Language set = en de fr es
Processing NAV_tagopts.txt
Checking line lengths...
to_phone: de exceeds base by 17.6%
    en: Phone
    de: Telefon
to_email: de exceeds base by 15.9%
    en: Email
    de: E-Mail

So, I won't claim this is the most streamlined process ever, but it does work for me; it means my volunteer translators only have to work in one way, regardless of whether it's app or website, and we can get things done fast (it's also possible to use the free version of the DeepL API, too, with a small tweak), plus flag up things that might be too big for on-screen labels or buttons.

If there's any interest, I can tidy the code up a bit and pop it on Github somewhere, though I expect it's probably only really useful if you want to have a common translation process across your web site and your apps.

Nigel

rabbitBUSH · Jun 25, 2021

Interesting posting.

Just a comment :

nwhitfield said:
dealing with German, there are an awful lot of long words

Its the same in Afrikaans language. These languages treat one "thing" as one word. So - "River without end" in Afrikaans becomes : Riviersonderend. If you break that up its
Rivier/sonder/end exactly as the three english words. But when the principle applies across longer things you just get long "single" words with these languages. There would / could not be

nwhitfield said:
one for "use short words"

Much as we all would like that.......

We have a classic : tweebuffelsmeteenskootmorsdoodgeskietfontein - the important part is "fontein". It means fountain or water-spring (contextually this one). It means something like : The spring where two buffalo were shot dead with one shot. Its a place name and so ONE thing and therefore one word. [[this part "morsdood" doesn't really have an english translation but it non-the-less means dead (dood) (mors) means something like really - so really dead.]]

Messes with one's head at times.

* pronounciation is at 47 seconds in . . . *

emexes · Aug 19, 2021

nwhitfield said:
found the idea of using a spreadsheet unwieldy

We used Google Sheets to do translations and it was great.

1/ multiuser, shareable, accessible from anywhere
2/ translation function built-in
3/ easy export to a pipe-separated text file for direct use by program

What made it work great is that we'd translate from the base language (English) to the other language (say, German) AND THEN BACK AGAIN. If it came back same as the original base text, then the translation was probably good. If if came back different, it'd be flagged for humans to review and approve or amend. Sometimes the cure was to rephrase the base text. Two-thirds of the time, the translation was good. The rest of the time, it was close enough that we could ship the software to friendly sites as-is, and leave the final translations for a more convenient (or urgent

) time.

The spreadsheet included English-to-English translation as well as English-to-Not-English. Sounds stupid, but sometimes we had words that would be used in multiple places but mean different things and thus translate to different words, eg volume as in 3d size and volume as in sound level. So in the program we'd have base text "Volume(size)" and "Volume(audio)" and the English translations were the same ie "Volume" and "Volume" but the German translations were different ie "Volumen" and "Lautstärke".

Regarding the different lengths, I'd already gotten around that (in the DOS 80x25 text-mode version) by having scrolling fields in the screen manager module, which would truncate and add ".." to the end of any string that was too long to fit in its allocated screen space, and then if the user pressed .. on the keyboard, it would scroll all the oversize fields on the screen so that the operator could read the entire string. In the Windows version, the operator can scroll the edit fields manually, and for oversize labels we first tried using Arial Narrow, and if that wasn't enough then we'd reduce the font size until it fitted. Sometimes we had base text that was used in multiple places but with different screen space allocations, and so we'd use that English-to-English translation trick again and have two base texts "Exit(long)" and "Exit(short)" translate to "Exit" and "Exit" in English, and "Ausfahrt" and "Ausf." in German. I'd been meaning to extend the translation system to have multiple translations in preference order that the screen manager could go through until it found one that fitted within the screen space but... there was always more important stuff to do first.

Google Sheets could compare the lengths of base and translated texts, like you're already doing albeit not quite to your pixel perfection, to flag anything that grows more than x%.

That scrolling field thing used to get a workout during sales presentations. Another nice touch was that the date entry fields were just free-form text entry and they would take things like "yesterday", "today" and "tomorrow" (or, borrowing a good idea from VMS, enough of the word to distinguish it uniquely, eg "y" was enough for "yesterday") and "next monday" or "last friday" or "this thursday" or just "thursday" or "thu" or "th" (but not "t"), or "3 weeks" or "6 months" or "6m" (rounded off to the nearest weekday), as well as the usual numeric dates, rounded off to the nearest month and century (because Y2K was a thing back then).

And it was smart enough to accept 19/8 or 19-8 or 19.8 or 19 8 or 198 (which must be 19/8 because there's no 98th month) or even just 19 and it would use the current month (and year). If it was in arseabout mode, it was instead equally happy with 8/19 or 8-19 or 8.19 or 8 19 or 819 or again even just 19. One and two digit numbers were always just the day-of-month, and >= three digit numbers were accepted if they could be interpreted unambiguously.

edit: it's all coming back to me now ? I also stuck in every fixed-date holiday or celebration I could think of, eg "christmas", "xmas", "new year's day", "nyd", "australia day", "queen's birthday" (unless workshop address was WA or Queensland), "anzac day", "boxing day", new year's eve", "nye", "valentine's day" (but not "vd" ✌) , "mothers' day" and "fathers' day" (in Australia), and of course "grand final day" (last Saturday of September) and "cup day" (first Tuesday in November). Most of those days are holidays or weekends and thus (I thought) a bit pointless for use in a business, but they were great for breaking the ice during sales presentations, and it turned out they actually got quite a workout in real life thanks to another date-entry nicety that briefly displayed the day-of-week and number-of-weekdays-between-now-and-then for whatever date they'd typed in.

ps: can you tell we're back in righto-youse-guys-no-more-mucking-around lockdown here? ?

Localisation tools

nwhitfield

Active Member

rabbitBUSH

Well-Known Member

emexes

Expert