如何获取包含HtmlUnit错误的网页
How to get web page that contain errors with HtmlUnit?
我正试图使用HtmlUnit 2.15 API在Java程序中访问此Ajax页面,但尝试获取该页面时失败。我认为原因是网站请求位于此处的这个损坏/丢失的文件。
我的代码:
public class HtmlUnitExample {
public static void main(String[] args) throws Exception, FailingHttpStatusCodeException, MalformedURLException, IOException, InterruptedException {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
webClient.getOptions().setTimeout(120000);
webClient.waitForBackgroundJavaScript(60000);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setDoNotTrackEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
String url = "http://www.santanderuniversidades.com.br/JuriPopular/index.aspx?idprojeto=16";
final HtmlPage page = (HtmlPage) webClient.getPage(url); //Fails here
System.out.println(page.asXml());
}
}
错误消息:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:68)
at com.gargoylesoftware.htmlunit.HttpWebConnection.downloadContent(HttpWebConnection.java:693)
at com.gargoylesoftware.htmlunit.HttpWebConnection.downloadResponseBody(HttpWebConnection.java:675)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:201)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1313)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1230)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:338)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:407)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:392)
at HtmlUnitExample.main(HtmlUnitExample.java:42)//getPage line
css的页面链接:
<link href='/JuriPopular/App_Themes/estilo/css.axd?files=jPages.css,estilo.css,jquery.fancybox.css' type='text/css' rel='stylesheet' />
调用丢失字体文件的css:
@font-face{font-family:'DigitalDotRoadsign';
src:url('fonts/DigitalDotRoadsign.eot');
src:url('fonts/DigitalDotRoadsign.eot?#iefix') format('embedded-opentype'),
url('fonts/DigitalDotRoadsign.woff') format('woff'), //call missing file
url('fonts/DigitalDotRoadsign.ttf') format('truetype'),
url('fonts/DigitalDotRoadsign.svg#svgDigitalDotRoadsign') format('svg');
font-weight:normal;
}
这是我问题的根源吗?如果是这样的话,有没有办法避免呢?也许忽略/消除了问题的原因?
实际上,为了解决这个问题,我只启用了cookie。我想加载页面是必要的。
代码:
webClient.getCookieManager().setCookiesEnabled(true);
相关文章:
- Node.js v6.2.0类扩展不是函数错误
- Jquery菜单操作不稳定,定位不正确,存在一般错误
- document.open/document.write没有正确地清除chrome中的文档——这是chrome的错误吗
- 试图在引导模式内动态生成图表,得到offsetWidth错误
- 为什么会出现错误;未捕获的类型错误:undefined不是函数;
- 我如何修复包含在captcha的addthis中的错误
- 同样,同样的错误'ahorcado.js:26未捕获类型错误:无法读取属性'beginPath'
- 节点是否需要模块传递带有方括号的arg?这是个错误吗
- Webpack/Rect:遵循egghead.io教程,但出现错误:您可能需要一个合适的加载程序来处理此文件类型
- CKFinder 3为所选文件返回错误的URL
- 同位素库错误:未捕获错误无布局模式包装生产线8
- 铬:“;未捕获的语法错误:意外的标记:"
- 如何通过自己获得Chrome扩展的用户反馈/错误报告
- 相位器状态未捕获参考错误
- /undefined在我的404错误日志中多次出现
- Javascript未捕获语法错误意外的标识符错误
- javascript:如何在antlr生成的Lexer中进行错误处理
- Angularjs工厂注入错误
- 如何获取包含HtmlUnit错误的网页
- 在游戏框架集成测试中禁用HtmlUnit javascript错误