Wednesday, May 29, 2013

Wrong UTF-8 string parsing in GWT JSON

Hi, my stack is overflow :):):):) I can't find the solution...

I'm developing a server-client application.
My GWT client running in browser. Communicate with my C++ server by:
GWT-JSON -> lighttpd -> libfcgi -> cgicc -> libjson -> C++ application

My problem:
The server response with a JSON string to client's request. This response contains UTF-8 strings. Accent characters encoded correctly with "\uXXXX" in response. For example: "Á" encoded: "\u00C3\u0081".
The client extract the string from JSON string. But the extracted string contains bad encoded characters. :(:(:(:(

Lucky, I can narrow the problem to JSON-GWT. Here is a code to demonstrate the problem, running in client side only in GWT:

    String response="{ \"test\" : \"\\u00C3\\u0081\\u00C3\\u0089\\u00C5\\u00B0\" }"; //"ÁÉÜ" in UTF-8
    JSONObject json=JSONParser.parseStrict(response).isObject();
    String s1=json.get("test").isString().stringValue();
    byte[] b1=s1.getBytes();

The results:
    Alert is: "à ÉŰ" instead of "ÁÉÜ"
    s1="à ÉŰ" instead of "ÁÉÜ"
    b1=[ 0xc3, 0x83, 0xc2, 0x81, 0xc3, 0x83, 0xc2, 0x89, 0xc3, 0x85, 0xc4, 0xb0 ] (incorrect)

Here is an another test:
    String s2="ÁÉŰ";
    byte[] b2=s2.getBytes();

The results:

    Alert: "ÁÉÜ" (correct)
    s2="ÁÉÜ" (correct)
    b2=[ 0xc3, 0x81, 0xc3, 0x89, 0xc5, 0xb0 ] (correct, same as in "response" string above)

I think, the JSONParser.parseStrict or JSONObject.get().isString().stringValue() can't handle correctly the UTF-8 characters...

Any idea? :(:(:(:(

Additional information:
The request and response Content-Type is "application/json; charset=UTF-8". Source code files and the development environment use UTF-8. The HTML page in browser encoding also UTF-8.

I have problem only with the response only. The request character coding/encoding is correct.

Thank you for any help,

