Thursday, May 30, 2013

Re: Wrong UTF-8 string parsing in GWT JSON

On 30/05/2013 08:49, Tibor Szolnoki wrote:
> Hi, my stack is overflow :):):):) I can't find the solution...
> I'm developing a server-client application.
> My GWT client running in browser. Communicate with my C++ server by:
> GWT-JSON -> lighttpd -> libfcgi -> cgicc -> libjson -> C++ application
> My problem:
> The server response with a JSON string to client's request. This response contains UTF-8
> strings. Accent characters encoded correctly with "\uXXXX" in response. For example: "Á"
> encoded: "\u00C3\u0081".
> The client extract the string from JSON string. But the extracted string contains bad
> encoded characters. :(:(:(:(
> Lucky, I can narrow the problem to JSON-GWT. Here is a code to demonstrate the problem,
> running in client side only in GWT:
> String response="{ \"test\" : \"\\u00C3\\u0081\\u00C3\\u0089\\u00C5\\u00B0\" }";
> //"ÁÉÜ" in UTF-8

No. That's not UTF-8, that's UNC encoding. It results in Java's UTF-16 encoding.

> JSONObject json=JSONParser.parseStrict(response).isObject();
> String s1=json.get("test").isString().stringValue();
> Window.alert(s1);
> byte[] b1=s1.getBytes();
> The results:
> Alert is: "à ÉŰ" instead of "ÁÉÜ"
> s1="à ÉŰ" instead of "ÁÉÜ"

That's the correct UTF-8 encoding of your characters.

It is working as intended. Perhaps you should read a bit more about Unicode, UTF-8, UTF-16
and all this confusing stuff... :-)

Philippe Lhoste
-- (near) Paris -- France
-- -- -- -- -- -- -- -- -- -- -- -- -- --

You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
Visit this group at
For more options, visit

No comments:

Post a Comment