java - Why can't Apache HttpClient 4.2 retrieve this page? -


i'm trying retrieve page using apache httpclient: http://quick-dish.tablespoon.com/

unfortunately, when try this, returns following (as returned jsoup, it's returning http... string itself):

<html>  <head></head>  <body>   http/1.1 200 ok [server: nginx/1.0.11, content-type: text/html;charset=utf-8, last-modified: mon, 02 jul 2012 15:30:40 gmt, vary: accept-encoding, cookie,accept-encoding, x-powered-by: php/5.3.6, x-pingback: http://quick-dish.tablespoon.com/xmlrpc.php, x-powered-by: asp.net, content-encoding: gzip, x-blz: lb1.blaze.io, date: mon, 02 jul 2012 16:06:21 gmt, content-length: 11723, connection: keep-alive]  </body> </html> 

here code (note i'm emulating google bot i've found web servers tend better behaved way):

url sourceurl = new url("http://quick-dish.tablespoon.com/"); httpclient httpclient =  new contentencodinghttpclient(); httpclient.getparams().setbooleanparameter("http.protocol.handle-redirects", true);  final httpget httpget = new httpget(sourceurl.touri()); httpget.setheader("user-agent", "mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)"); httpget.setheader("accept", "text/html"); httpget.setheader("accept-charset", "utf-8");  final httpresponse response = httpclient.execute(httpget); return jsoup.parse(response.tostring()); 

needless say, page returns fine in web browser. ideas?

instead of tostring need response entity

// hold of response entity  httpentity entity = response.getentity(); 

then can contents of that


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -