Thursday, May 15, 2014

Re: Can you tell me Step-By-Step Guidelines of How to use HtmlUnit to make GWT app Crawlable?

ok, here is what I am trying.

I created a class called CrawlServlet



import java.io.IOException;


import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;


public class CrawlServlet implements Filter{
 


 
@Override
 
public void destroy() {
 
// TODO Auto-generated method stub
 
 
}


 
@Override
 
public void doFilter(ServletRequest request, ServletResponse response,
 
FilterChain chain) throws IOException, ServletException {
 
// TODO Auto-generated method stub
 
HttpServletRequest httpRequest = (HttpServletRequest) request;
 
String requestURI = httpRequest.getRequestURI();
     
if ((requestURI != null) && (requestURI.contains("_escaped_fragment_"))) {
       
System.out.println(requestURI);
     
} else {
     
try {
       
// not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
        chain
.doFilter(request, response);
     
} catch (ServletException e) {
       
System.err.println("Servlet exception caught: " + e);
        e
.printStackTrace();
     
}
   
}
   
 
}


 
@Override
 
public void init(FilterConfig arg0) throws ServletException {
 
// TODO Auto-generated method stub
 
 
}
}


 
in lib/web.xml, i have

<filter>
     
<filter-name>CrawlServlet</filter-name>
     
<filter-class>CrawlServlet</filter-class>
 
</filter>


 
<filter-mapping>
     
<filter-name>CrawlServlet</filter-name>
     
<url-pattern>/*</url-pattern>
 
</filter-mapping>


After ran my GWT app, i got this error 503

[ERROR] 503 - GET /Myproject.html?gwt.codesvr=127.0.0.1:9997 (127.0.0.1) 1299 bytes
   
Request headers
     
Accept: text/html, application/xhtml+xml, */*
      Accept-Language: en-AU
      User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
      Accept-Encoding: gzip, deflate
      Host: 127.0.0.1:8888
      If-Modified-Since: Wed, 16 Apr 2014 00:35:41 GMT
      Connection: keep-alive
   Response headers
      Cache-Control: must-revalidate,no-cache,no-store
      Content-Type: text/html;charset=ISO-8859-1
      Content-Length: 1299


Can anyone tell me what is the right way to create HTMLSnapShot using HTMLUnit?


On Friday, May 16, 2014 9:49:33 AM UTC+10, Jens wrote:
so Google use server-side technology to create Html snapshot? or they use HtmlUnit?

I don't know what they use, but I would say they don't use HtmlUnit for groups.google.com.

-- J.

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-toolkit+unsubscribe@googlegroups.com.
To post to this group, send email to google-web-toolkit@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment