java HTMLUnit:大量过时的内容并且无法在 getPage() 上创建对象警告然后失败并在 getByXPath() 上调用 setOuterHTML 异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17436855/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-01 01:58:37  来源:igfitidea点击:

HTMLUnit: Tons of obsolete content and can't create objects warnings on getPage() then fails with Exception invoking setOuterHTML on getByXPath()

javahtmlunit

提问by Jeff

I'm trying out HTMLUnit to automate downloading data off a webapp. However, I am getting a whole mess of warnings on getPage() (most of which seem to deal with linked scripts that I don't think i even need) and then a fatal com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML when I try and run getByXPath to pull the data I'm looking for. And from the errors I get, I can't for the life of me figure out what's going on. Y'all got any ideas?

我正在尝试使用 HTMLUnit 来自动从 web 应用程序下载数据。但是,我在 getPage() 上收到了一大堆警告(其中大部分似乎处理我认为我什至不需要的链接脚本),然后是致命的 com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML when我尝试运行 getByXPath 来提取我正在寻找的数据。从我得到的错误中,我终生无法弄清楚发生了什么。大家有什么想法吗?

Here's my code:

这是我的代码:

import java.util.List;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class ScrapperApp {

    private static void go() throws Exception {
        HtmlPage nextPage;
        String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";

        final WebClient webclient = new WebClient();
        final HtmlPage page = webclient.getPage(url);

        System.out.println("PULLING LINKS:");

        List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//div[@class='hform1']/a[@class='lblentrylink']");

        /*for(int x=0; x<articles.size(); x++) {
            nextPage = articles.get(x).click();
            System.out.println(nextPage.getBody());
        }*/
    }

    public static void main(String[] args) throws Exception {
        go();
        System.out.println("COMPLETE");
    }

}

and here is my console output:

这是我的控制台输出:

Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[443] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[448] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[456] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:52 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:53 PM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument execCommand
WARNING: Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/search/theethics.css' [1621:72] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/search/theethics.css' [1621:72] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/search/theethics.css' [1722:1] Error in style sheet. (Invalid token ".123". Was expecting one of: <EOF>, <S>, <IDENT>, "<!--", "-->", <HASH>, <IMPORT_SYM>, <PAGE_SYM>, <MEDIA_SYM>, ".", ":", "*", "[", <ATKEYWORD>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [4:1] Error in style rule. (Invalid token ".". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [4:1] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [538:16] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [538:16] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdtheitroadpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [6:1] Error in style rule. (Invalid token ".". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdtheitroadpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [6:1] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdtheitroadpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [105:17] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdtheitroadpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [105:17] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdtheitroadpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [160:16] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdtheitroadpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [160:16] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://media.ethics.ga.gov/Search/Telerik.Web.UI.WebResource.axd?_TSM_HiddenField_=ctl00_ContentPlaceHolder1_RadScriptManager1_TSM&compress=1&_TSM_CombinedScripts_=%3b%3bSystem.Web.Extensions%2c+Version%3d3.5.0.0%2c+Culture%3dneutral%2c+PublicKeyToken%3d31bf3856ad364e35%3aen-US%3a7263e9c6-5962-41bc-b839-88b704bfcf0d%3aea597d4b%3ab25378d2%3bTelerik.Web.UI%2c+Version%3d2011.2.915.35%2c+Culture%3dneutral%2c+PublicKeyToken%3d121fae78165ba3d4%3aen-US%3a168ec6eb-791b-4159-8a0f-6c601196f873%3a16e4e7cd%3af7645509%3a24ee1bba%3af46195d3%3a19620875%3a874f8ea2%3a490a9d4e%3abd8f85e4%3bAjaxControlToolkit%2c+Version%3d3.0.20820.16598%2c+Culture%3dneutral%2c+PublicKeyToken%3d28f01b0e84b6d53e%3aen-US%3a707835dd-fa4b-41d1-89e7-6df5d518ffb5%3ab14bb7d5%3a13f47f54%3a369ef9d0%3a1d056c78%3adc2d6e36%3a5acd2e8e%3af8a45328] line=[997] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:56 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
PULLING LINKS:
Jul 2, 2013 6:19:56 PM com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl runSingleJob
SEVERE: Job run failed with unexpected RuntimeException: Exception invoking setOuterHTML
======= EXCEPTION START ========
Exception class=[java.lang.RuntimeException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:663)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:559)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:525)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:594)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:569)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:996)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:101)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:328)
    at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:161)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.RuntimeException: Exception invoking setOuterHTML
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:163)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:287)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:359)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2659)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:509)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2364)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1601)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1595)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1248)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:815)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:415)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:274)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3132)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:107)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doRun(JavaScriptEngine.java:587)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:651)
    ... 10 more
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
    at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1023)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement$ProxyDomNode.appendChild(HTMLElement.java:1091)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.handleCharacters(HTMLParser.java:710)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endDocument(HTMLParser.java:718)
    at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
    at org.cyberneko.html.HTMLTagBalancer.endDocument(HTMLTagBalancer.java:510)
    at org.cyberneko.html.filters.DefaultFilter.endDocument(DefaultFilter.java:213)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2116)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:818)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:162)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:121)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.parseHtmlSnippet(HTMLElement.java:1048)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1035)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:137)
    ... 26 more
Enclosed exception: 
java.lang.RuntimeException: Exception invoking setOuterHTML
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:163)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:287)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:359)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2659)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:509)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2364)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1601)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1595)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1248)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:815)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:415)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:274)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3132)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:107)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doRun(JavaScriptEngine.java:587)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:651)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:559)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:525)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:594)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:569)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:996)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:101)
    at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:328)
    at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:161)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
    at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1023)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement$ProxyDomNode.appendChild(HTMLElement.java:1091)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.handleCharacters(HTMLParser.java:710)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endDocument(HTMLParser.java:718)
    at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
    at org.cyberneko.html.HTMLTagBalancer.endDocument(HTMLTagBalancer.java:510)
    at org.cyberneko.html.filters.DefaultFilter.endDocument(DefaultFilter.java:213)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2116)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:818)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:162)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:121)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.parseHtmlSnippet(HTMLElement.java:1048)
    at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1035)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:137)
    ... 26 more
== CALLING JAVASCRIPT ==

  function () {
      return b.apply(a, arguments);
  }

======= EXCEPTION END ========
COMPLETE

回答by acdcjunior

The error comes from a MicrosoftAjax.jsfile. Try simulating chrome:

错误来自一个MicrosoftAjax.js文件。尝试模拟铬:

final WebClient webclient = new WebClient(BrowserVersion.CHROME);

Also added a link to suppress HtmlUnit warnings.

还添加了一个链接来抑制 HtmlUnit 警告。

Also, your XPath doesn't find anything (I tested in Chrome). I used another for example purposes:

此外,您的 XPath 没有找到任何东西(我在 Chrome 中测试过)。我使用另一个示例目的:

import java.util.List;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class ScrapperApp {

    private static void go() throws Exception {
        /* turn off annoying htmlunit warnings */
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);

        HtmlPage nextPage;
        String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";

        final WebClient webclient = new WebClient(BrowserVersion.CHROME);
        final HtmlPage page = webclient.getPage(url);

        System.out.println("PULLING LINKS:");

        List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//a[@class='lblentrylink']");
        //List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//div[@class='hform1']/a[@class='lblentrylink']");

        for(int x=0; x<articles.size(); x++) {
            System.out.println("Clicking "+articles.get(x).asText());
            //nextPage = articles.get(x).click();
            //System.out.println(nextPage.getBody());
        }
    }
    public static void main(String[] args) throws Exception {
        go();
        System.out.println("COMPLETE");
    }
}