Re: Wrong UTF-8 string parsing in GWT JSON

In RFC4627: JSON "String" and "Text" is two different things.

Text: is a sequence of JSON objects, with barckets, strings, quotes etc...
(RFC4627 section 2.)

String: Is a JSON basic data type (single JSON data).
(RFC4627 section 2.5.)

As RFC4627 text and string encoding shall be different.
As you write, Text is default UTF-8, determined by first 4 characters. (section 3)
But not the String!

String is always Unicode, unicode characters escaped by "\uXXXX". (section 2.5)

I was only problem with JSON string, not the whole JSON text.
(My text encoding is UTF-8, as default)

I found my solution: I have to use Unicode characters in JSON string. That's all...

As Philip writes, GWT works as indeed.
GWT JSON  parser "\uXXXX" interpret as UTF-16 character.
And this is independent from JSON text encoding, which is UTF-8.

On Friday, May 31, 2013 10:32:40 AM UTC+2, Thomas Broyer wrote:

On Friday, May 31, 2013 10:07:23 AM UTC+2, Tibor Szolnoki wrote:
Dear Philippe,

You are right...
If I change the escaped ("\uXXXX") codes to UTF-16, for my example:
String response="{ \"test\" : \"\\u00c1\\u00c9\\u0170\" }"; //"ÁÉÜ" in UTF-16
All works correctly.

But I found a strange thin too:
If I disable  the"\uxxxx" escaping in JSON writer in server side, all works as expected. But this is not a good idea according to RFC4627 :((((.

I can't find where it says it's "not a good idea". It says all over the place that JSON "SHALL be encoded in Unicode", with a default to UTF-8, so why not just use UTF-8?
In this mode, the JSON string transports the non-printable characters (0xc3, 0x81, 0xc3, 0x89, 0xc5, 0xb0) ("ÁÉÚ" in UTF-8) without any encoding....

These are bytes, not characters.
The encoding is determined by the first 4 bytes of the response (see RFC4627)

