Thursday, May 15, 2014

Re: The crawler escapes “mydomain#!article” into “mydomain?_escaped_fragment_=article”, how to retrieve back the original url?

is it ok to do like this
originalUrl=java.net.URLDecoder.decode(originalUrl, "UTF-8");

Which one do we have to use "UTF-8" or "ASCII?

So when the crawler escape the url, does it use URL.encode()? 
if it does then which one it uses "UTF-8" or "ASCII"?

On Friday, May 16, 2014 2:49:45 PM UTC+10, Tom wrote:

Ok, here is what Google said (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started).

When a crawler sees a url like this www.example.com/ajax.html#!key=value, it will temporarily convert that url into www.example.com/ajax.html?_escaped_fragment_=key=value

However, when doing that it also escapes certain characters in the fragment during the transformation. Ex: www.example.com/ajax.html#!key=value;car=% to www.example.com/ajax.html?_escaped_fragment_=key=value;car=%25

so if we want to convert www.example.com/ajax.html?_escaped_fragment_=key=value;car=%25 back to the original url then we need to unescape all %XX characters in the fragment.

Google said:

Note: The crawler escapes certain characters in the fragment during the transformation. To retrieve the original fragment, make sure to unescape all %XX characters in the fragment. More specifically, %26 should become &, %20 should become a space, %23 should become #, and %25 should become %, and so on.

But google doesn't say How to do that in java.

String originalUrl=changedStr.replace("?_escaped_fragment_=", "!#");  // then what to do next so that all the escaped characters will go back to normal?

http://stackoverflow.com/questions/23692748/the-crawler-escapes-mydomainarticle-into-mydomain-escaped-fragment-articl

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-toolkit+unsubscribe@googlegroups.com.
To post to this group, send email to google-web-toolkit@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment