Weblog language translator – beta

I’ve just implemented a weblog language translator, based on Google Translator.

It’s very rough and ready, deserving of the title “beta”, and very simple, but it appears to do quite a nice job of translating into the majority of the World’s most used languages.

I had just been reading The Aquarium (over here), and I was very impressed with it’s multi-lingual support.  I don’t know how the guys are doing this, but I’m presuming that they are actually translating the text manually (i.e. with human editors), after searching on the topic of blog translation.

The languages that I’ve included are: Mandarin (Simplified Chinese), Chinese (Traditional Chinese), English, Spanish, Arabic, Portuguese, Russian, Japanese, German, Korean, French and Italian. 

I wanted to do the fifteen or so most used languages – however the sources I found disagreed slightly on actual numbers and rankings.  The sources I used to understand the breakdown of percentage of languages spoken by the World population were:

1) Dr. Dennis O’Neil’s website (here) at the Behavioral Sciences Department, Palomar College, San Marcos, California.

2) The “Languages of the World” article (here) at The National Virtual Translation Center.

3) The “List of languages by number of native speakers” article (here) at Wikipedia. 

Unfortunately it suffers from two major issues. Firstly it’s limited to the languages supported by the Google Translator service, which unfortunatly does not cover a number of the World’s most used languages (notably Hindi and Bengali).  Secondly the Google Translation service modifies the page links so that the “Language” links I’ve implemented are translated twice, which fails in the service at runtime. 

Other issues include: maximum text amount that can be translated is limited (or appears to be, so that part of the page doesn’t get translated), the banner I’ve implemented goes awry in some of the translations, the sidebar isn’t getting translated (might be due to the text length limit issue, as the sidebar is written after the content), and, as I don’t speak the majority of these languages, I’m presuming the translation that it does is no means as good as an actual, professional, human translator. 

I’m going to tweak the code and look at how (and if) I can use the service to perhaps translate individual components, plus I’m going to see if the Google API can provide a more succint and elegant dynamic solution.  I had tried to implement in both Google Translator and Yahoo Babelfish, but the Babelfish service was erroring out, thus the use of Google – I might try it again later though.

I have a other requirements for this functionality too: ideally it should produce pages which can be indexed by the major search engines and it should translate feeds – both RSS and Atom. 

Have a look and see what you think – any opinion would be good, especially from those who aren’t native English speakers.