Google Web Toolkit: Re: Can you tell me Step-By-Step Guidelines of How to use HtmlUnit to make GWT app Crawlable?

Thursday, May 15, 2014

Re: Can you tell me Step-By-Step Guidelines of How to use HtmlUnit to make GWT app Crawlable?

I done 70%, but still have some error.

Ok, here is what i did, I downloaded htmlunit-2.14 & unzip it & copy these jar files into my lib folder

htmlunit-2.14

commons-codec

commons-collections

commons-io

commons-logging

cssparser

htmlunit-core-js

nekohtml

commons-lang3

httpclient

httpmime

jetty-websocket

xalan

xercesImpl

After that, i created "public class CrawlServlet implements Filter" as mentioned above:

@Override
 public void doFilter(ServletRequest request, ServletResponse response,
 FilterChain chain) throws IOException, ServletException {
 // TODO Auto-generated method stub
 HttpServletRequest httpRequest = (HttpServletRequest) request;
 String requestQueryString = httpRequest.getQueryString();

     if ((requestQueryString != null) && (requestQueryString.contains("_escaped_fragment_"))) {
     // rewrite the URL back to the original #! version
     String url_with_hash_fragment=requestQueryString.replace("?_escaped_fragment_=", "!#");

         // remember to unescape any %XX characters
         //url_with_hash_fragment = rewriteQueryString(url_with_escaped_fragment);

         // use the headless browser to obtain an HTML snapshot
         final WebClient webClient = new WebClient();
         HtmlPage page = webClient.getPage(url_with_hash_fragment);


         // important!  Give the headless browser enough time to execute JavaScript
         // The exact time to wait may depend on your application.
         webClient.waitForBackgroundJavaScript(2000);


         // return the snapshot
         PrintWriter out = response.getWriter();
         out.println(page.asXml());
     } else {
      try {
        // not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
        chain.doFilter(request, response);
      } catch (ServletException e) {
        System.err.println("Servlet exception caught: " + e);
        e.printStackTrace();
      }
    }
   
 }

Ok, now I ran my GWT app in eclipse & open the url "http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article"

& here the error in eclipse

[ERROR] 500 - GET /Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article (127.0.0.1) 4840 bytes
   Request headers
      Accept: text/html, application/xhtml+xml, */*
      Accept-Language: en-AU
      User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
      Accept-Encoding: gzip, deflate
      Host: 127.0.0.1:8888
      Connection: keep-alive
      Cookie: JSESSIONID=5eehbjnnhsz6m6hlk7el8tu; SILFORACOOKIE=5eehbjnnhsz6m6hlk7el8tu
   Response headers
      Set-Cookie: JSESSIONID=ro2z2xqrbi0j93zfrb1uihl8;Path=/
      Set-Cookie: SILFORACOOKIE=ro2z2xqrbi0j93zfrb1uihl8;Path=/
      Content-Type: text/html;charset=ISO-8859-1
      Cache-Control: must-revalidate,no-cache,no-store
      Content-Length: 4840

In the ChromeBrowser it showed:

HTTP ERROR 500

Problem accessing /Myproject.html. Reason: Server Error

Caused by:java.net.MalformedURLException: no protocol: gwt.codesvr=127.0.0.1:9997!#article
 at java.net.URL.<init>(Unknown Source)

Do you know how to fix it?

On Friday, May 16, 2014 12:06:24 AM UTC+10, Jens wrote:

HtmlUnit is bundles as jar file so you can put it (and all its dependencies) into WEB-INF/lib of your war.

Then you need to write a servlet that takes the server request of the Google bot, rewrites the _escaped_fragment_ parameter back to the original #!<token> url and starts HtmlUnit with that url. The resulting/rendered page will then be returned by the servlet.

At the bottom is an example:

https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot

The rendered page that you serve the Google Bot does not have to be a 1:1 copy of your original page. It is enough if the same content is available, styling is irrelevant. For example compare:

https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k
https://groups.google.com/forum/?_escaped_fragment_=topic/google-web-toolkit/Syi04ArKl4k

-- J.

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-toolkit+unsubscribe@googlegroups.com.
To post to this group, send email to google-web-toolkit@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.

Google Web Toolkit

Thursday, May 15, 2014

Re: Can you tell me Step-By-Step Guidelines of How to use HtmlUnit to make GWT app Crawlable?

No comments:

Post a Comment