Tag Archives: arabic

Weblog language translator – blog translation on the fly with Roller specific functionality

Finally got round to upgrading my ‘Weblog language translator‘ from beta.

Key to improving it was removing the roll over based banner I had implemented (the Google translation service, which I piggy-back off of, only translates circa 3k characters, so the banner header, full of links was using up the majority of the translation).

Obviously this points out a few of the flaws of the implementation, namely reliance on Google to provide the service (and of course a dependency on the call syntax not changing), and all of the weaknesses that follow on from relying on the Google service, not least the translatable character limit.

This time round I’m much happier with the implementation – and I’ve done a fair bit of testing to ensure it’s fit for purpose.

Unlike the other implementations out on the web I’ve added Roller specific functionality, implemented in JavaScript, creating a ‘main’ (or rather ‘weblog’) page for each language.

I did this because I wanted to tailor the service to be language specific, and because the major search engines outside of the English speaking, Google dominated, Internet, often verify that there is actual language specific content (and I want these search engines to be able to index my site, even if that’s only a couple of pages).

The code uses Roller Weblogger specific URL notations to provide the matching ‘weblog_xx’ (where xx stands for the two character country code – five characters when looking at Traditional and Simplified Chinese) to the target language to be translated to.

Currently it works for the generic weblog URL, all ‘entry’ variants, all ‘date’ variants, and all ‘page’ variants. It doesn’t work for ‘tags’ or ‘category’ variants (mainly because I haven’t had time to research the URL notation), but I hope to get this done soon. I’ll research and code up the other, alternative Roller URL formations when I next revisit the code. I find this acceptable, as it still provides a translation, however without accessing the language specific ‘weblog’ page.

The JavaScript is available via the page source – and you’re welcome to have a look and re-use if you wish (it’s nowhere near the nicest bit of JavaScript available – if you’d like to tidy it up at all you’re more than welcome).

I’ve also added Dutch and Greek to the list of languages that can be translated to, as these have been recently added to Google’s translation service (still no Hindi or Bengali though). That makes a total of fourteen languages, including the already implemented Simplified Chinese, Traditional Chinese (Taiwanese), English, Spanish, Arabic, Portuguese, Russian, Japanese, German, Korean, French and Italian. Plus I’ve replaced the language text with flag icons – which improves the look and feel too.

The icons are “available for free use for any purpose with no requirement for attribution” (although I thought it would be nice to credit the originating site) from FamFamFam, by fellow ‘Brummie‘ Mark James, available at http://www.famfamfam.com/lab/icons/flags/

Previously, after the initial implementation in beta, I found a variety of resources in a similar vein, none of which are Roller specific though, here’s a few examples for you to have a look at if you’re interested:

Have to admit I’m really glad I’ve tidied this up as I was starting to feel as though it was in danger of genuinely being in ‘permanent beta’, and however fashionable that is, in the apocryphal words of Steve Jobbs: “real artists ship”.

Weblog language translator – beta

I’ve just implemented a weblog language translator, based on Google Translator.

It’s very rough and ready, deserving of the title “beta”, and very simple, but it appears to do quite a nice job of translating into the majority of the World’s most used languages.

I had just been reading The Aquarium (over here), and I was very impressed with it’s multi-lingual support.  I don’t know how the guys are doing this, but I’m presuming that they are actually translating the text manually (i.e. with human editors), after searching on the topic of blog translation.

The languages that I’ve included are: Mandarin (Simplified Chinese), Chinese (Traditional Chinese), English, Spanish, Arabic, Portuguese, Russian, Japanese, German, Korean, French and Italian. 

I wanted to do the fifteen or so most used languages – however the sources I found disagreed slightly on actual numbers and rankings.  The sources I used to understand the breakdown of percentage of languages spoken by the World population were:

1) Dr. Dennis O’Neil’s website (here) at the Behavioral Sciences Department, Palomar College, San Marcos, California.

2) The “Languages of the World” article (here) at The National Virtual Translation Center.

3) The “List of languages by number of native speakers” article (here) at Wikipedia. 

Unfortunately it suffers from two major issues. Firstly it’s limited to the languages supported by the Google Translator service, which unfortunatly does not cover a number of the World’s most used languages (notably Hindi and Bengali).  Secondly the Google Translation service modifies the page links so that the “Language” links I’ve implemented are translated twice, which fails in the service at runtime. 

Other issues include: maximum text amount that can be translated is limited (or appears to be, so that part of the page doesn’t get translated), the banner I’ve implemented goes awry in some of the translations, the sidebar isn’t getting translated (might be due to the text length limit issue, as the sidebar is written after the content), and, as I don’t speak the majority of these languages, I’m presuming the translation that it does is no means as good as an actual, professional, human translator. 

I’m going to tweak the code and look at how (and if) I can use the service to perhaps translate individual components, plus I’m going to see if the Google API can provide a more succint and elegant dynamic solution.  I had tried to implement in both Google Translator and Yahoo Babelfish, but the Babelfish service was erroring out, thus the use of Google – I might try it again later though.

I have a other requirements for this functionality too: ideally it should produce pages which can be indexed by the major search engines and it should translate feeds – both RSS and Atom. 

Have a look and see what you think – any opinion would be good, especially from those who aren’t native English speakers.