Playing with the Google Language API

Friday
Sep 26,2008

 
The Google AJAX Language API lets you translate and detect the language of blocks of text. For non-Javascript environments, it exposes also a simple RESTful interface to use in combination with JSON to decode the result. Let’s try a simple Flex example of language detection :

Demo (right click for the source code)

It’s pretty easy to achieve. Here is how to call the service

var srv:HTTPService = new HTTPService(); 
srv.url = 'http://ajax.googleapis.com/ajax/services/language/detect'; 
srv.request.v = '1.0'; /*version - might change in the future*/ 
srv.request.q = /*the text to send*/; 
srv.addEventListener(ResultEvent.RESULT, /*a resultHandler function*/); 
srv.addEventListener(FaultEvent.FAULT, /*a faultHandler function*/); 
srv.send();

The JSON Response is the following :

{
  "responseData" : {
    "language" : the-detected-language,
    "isReliable" : the-reliability-of-the-detect,
    "confidence" : the-confidence-level-of-the-detect
  },
  "responseDetails" : null | string-on-error,
  "responseStatus" : 200 | error-code
}

Okay, nice but Why am I so enthusiast about that API ?

Those last years, many of my projects involved to deal with translations in the 22 (formally 23 with the Gaelic) official EU languages. When I was thinking about adding more automation, I was always facing the problem of being sure about the language used in the document - It’s something that I already discussed with some guys at 360|Flex Europe in April. Now I could achieve this quite easily :-)

 Demo (right click for the source code)

Maybe it deserves a better explanation :-) To be able to detect the language in a document (a MS Word document only at the moment, but I will try to support more document types later), we need to send plain text to the Google API. To do this, I’m using the Apache POI project and their HWPF extractor wrapper to extract the text, and then I split it in several parts of 300 characters (500 is the maximum supported but it makes my code freezing) and send them to the Google detection API. After recovering all of them, I compile and display the language(s) found. 

My demo is not 100% perfect but it’s working pretty well and it’s really fast. I’m using ColdFusion to upload the texts and call the POI library, and I let Flex deal with the multiple calls to the Google API. For an easier implementation, I created (or at least tried) a component detectLanguage that manage all steps from extraction to language detection.

If you need some MS Word documents for testing, you can have a look here, most of the texts are available in 22 languages since 2007. I’m waiting for you feedback and suggestions. Enjoy !



15 Responses to “Playing with the Google Language API”

  1. Lagaffe said:

    Hey Cyril,

    Nice post! I’m gona look a little deeper into the API and see where I can use it.

    But I’ve tested your flex app and it gave an error on Bulgarian and Greek…

  2. Cyril H. said:

    I suppose that it’s with the 2nd example ?
    Can you send me those docs or give me the reference where I could find them ?
    I know that sometimes the extraction is failing, it’s something that I should look at.

  3. Lagaffe said:

    It was with the first example. The text I was using was the first paragraphe of the following press release: Broadband Internet for all Europeans: Commission launches debate on future of universal service

  4. Cyril H. said:

    Problem solved. It’s the same issue that I had with documents. I need to lower the number of characters for certain languages. Example updated… thanks for your feedback :-)

  5. Lagaffe said:

    No Problem!

    2 more questions :p
    - Do you know if they only look at the words while detecting the language or also on the grammatics?
    - What does Google do with the text you send them?

  6. Cyril H. said:

    - for the language detection, I think they only look at words. They take care about the grammar only in the translation API.
    - good question :-) It’s probably deleted immediately after the processing. I cannot see any point on storing/keeping them. You could ask this question on the Google-AJAX API forums

  7. Christine said:

    Nice one! Worked perfectly for me – even on a document that I had written myself in Italian – and very fast

  8. PaulH said:

    i’m normally *very* skeptical of anything machine-based when it comes to languages (especially translations) but this did a pretty good job of detecting language. it only screwed up on some irish “Tá mé in ann gloine a ithe; Ní chuireann sé isteach nó amach orm” which it thought was pt. i’m normally limited to detecting unicode blocks:

    http://www.sustainablegis.com/unicode/testUBlocks.cfm

    so this is very cool. can it handle text w/mixed languages?

    thanks.

  9. Cyril H. said:

    @Christine yes, it’s really fast. I’ve done another “internal benchmark” with a script that get the blob from the database, extract the text and check/valid the language – impressive result !

    @PaulH me too, I didn’t expect such good results (you should try the translation API itself). It can handle texts with mixed languages. Take some texts from the link I provide at the end of the post, and create a new document with a mix of different languages. Pretty accurate… :-)

  10. Savvas Malamas said:

    Holy moly!
    So cool!
    I had to wait till Friday reaaaaly night to read it but I did and it sounds awesome.
    Flexy Coldfusion!

    Nice Cyril!

  11. Online Translation Service said:

    I have developed http://translator.vndv.com/ page which uses Google AJAX Language API. I also used Google AJAX Language API to translate the user interface of Online Translation Service to the following languages: English, Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish etc

  12. Cyril Hanquez said:

    @cfjedimaster Nice post. I’m using POI to extract text to detect the language in a document http://bit.ly/uP7qO

  13. Cyril H. said:

    Just saw that the second example was broken with FP10 – it’s fixed now. I’m preparing something new on the same subject… as soon as I will the time to finish it :-)

  14. Hans said:

    Absolutely great! The demo just works :-) My problem though is that I don’t want to send data to Google. Are there any “standalone” solutions out there?

  15. Cyril H. said:

    Hi Hans, there’s nothing wrong with sending data to Google. I’m sure that they won’t keep any of those – especially with the Translation API. I read it somewhere but cannot find the link anymore. What do you mean by “standalone” solution ? If you mean server-side comonents, I will reply Yes but not for free :-)

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

« Back to text comment