I created a class called CrawlServlet
import java.io.IOException;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
public class CrawlServlet implements Filter{
@Override
public void destroy() {
// TODO Auto-generated method stub
}
@Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
// TODO Auto-generated method stub
HttpServletRequest httpRequest = (HttpServletRequest) request;
String requestURI = httpRequest.getRequestURI();
if ((requestURI != null) && (requestURI.contains("_escaped_fragment_"))) {
System.out.println(requestURI);
} else {
try {
// not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
chain.doFilter(request, response);
} catch (ServletException e) {
System.err.println("Servlet exception caught: " + e);
e.printStackTrace();
}
}
}
@Override
public void init(FilterConfig arg0) throws ServletException {
// TODO Auto-generated method stub
}
}
in lib/web.xml, i have
<filter>
<filter-name>CrawlServlet</filter-name>
<filter-class>CrawlServlet</filter-class>
</filter>
<filter-mapping>
<filter-name>CrawlServlet</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>[ERROR] 503 - GET /Myproject.html?gwt.codesvr=127.0.0.1:9997 (127.0.0.1) 1299 bytes
Request headers
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-AU
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
Host: 127.0.0.1:8888
If-Modified-Since: Wed, 16 Apr 2014 00:35:41 GMT
Connection: keep-alive
Response headers
Cache-Control: must-revalidate,no-cache,no-store
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 1299Can anyone tell me what is the right way to create HTMLSnapShot using HTMLUnit?
On Friday, May 16, 2014 9:49:33 AM UTC+10, Jens wrote:
-- so Google use server-side technology to create Html snapshot? or they use HtmlUnit?I don't know what they use, but I would say they don't use HtmlUnit for groups.google.com.-- J.
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-toolkit+unsubscribe@googlegroups.com.
To post to this group, send email to google-web-toolkit@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.
No comments:
Post a Comment