I done 70%, but still have some error.
After that, i created "public class CrawlServlet implements Filter" as mentioned above:
On Friday, May 16, 2014 12:06:24 AM UTC+10, Jens wrote:
-- Ok, here is what i did, I downloaded htmlunit-2.14 & unzip it & copy these jar files into my lib folder
htmlunit-2.14
| commons-codec |
| commons-collections |
| commons-io |
| commons-logging |
| cssparser |
| htmlunit-core-js |
| nekohtml |
| commons-lang3 |
| httpclient |
| httpmime |
| jetty-websocket |
| xalan |
| xercesImpl |
After that, i created "public class CrawlServlet implements Filter" as mentioned above:
@Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
// TODO Auto-generated method stub
HttpServletRequest httpRequest = (HttpServletRequest) request;
String requestQueryString = httpRequest.getQueryString();
if ((requestQueryString != null) && (requestQueryString.contains("_escaped_fragment_"))) {
// rewrite the URL back to the original #! version
String url_with_hash_fragment=requestQueryString.replace("?_escaped_fragment_=", "!#");
// remember to unescape any %XX characters
//url_with_hash_fragment = rewriteQueryString(url_with_escaped_fragment);
// use the headless browser to obtain an HTML snapshot
final WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage(url_with_hash_fragment);
// important! Give the headless browser enough time to execute JavaScript
// The exact time to wait may depend on your application.
webClient.waitForBackgroundJavaScript(2000);
// return the snapshot
PrintWriter out = response.getWriter();
out.println(page.asXml());
} else {
try {
// not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
chain.doFilter(request, response);
} catch (ServletException e) {
System.err.println("Servlet exception caught: " + e);
e.printStackTrace();
}
}
}Ok, now I ran my GWT app in eclipse & open the url "http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article"
& here the error in eclipse
[ERROR] 500 - GET /Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article (127.0.0.1) 4840 bytes
Request headers
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-AU
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
Host: 127.0.0.1:8888
Connection: keep-alive
Cookie: JSESSIONID=5eehbjnnhsz6m6hlk7el8tu; SILFORACOOKIE=5eehbjnnhsz6m6hlk7el8tu
Response headers
Set-Cookie: JSESSIONID=ro2z2xqrbi0j93zfrb1uihl8;Path=/
Set-Cookie: SILFORACOOKIE=ro2z2xqrbi0j93zfrb1uihl8;Path=/
Content-Type: text/html;charset=ISO-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 4840
In the ChromeBrowser it showed:
HTTP ERROR 500
Problem accessing /Myproject.html. Reason: Server Error
Caused by:java.net.MalformedURLException: no protocol: gwt.codesvr=127.0.0.1:9997!#article
at java.net.URL.<init>(Unknown Source)
Do you know how to fix it?
HtmlUnit is bundles as jar file so you can put it (and all its dependencies) into WEB-INF/lib of your war.Then you need to write a servlet that takes the server request of the Google bot, rewrites the _escaped_fragment_ parameter back to the original #!<token> url and starts HtmlUnit with that url. The resulting/rendered page will then be returned by the servlet.At the bottom is an example:The rendered page that you serve the Google Bot does not have to be a 1:1 copy of your original page. It is enough if the same content is available, styling is irrelevant. For example compare:-- J.
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-toolkit+unsubscribe@googlegroups.com.
To post to this group, send email to google-web-toolkit@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.
No comments:
Post a Comment