Java 在我的 Storm 集群中读取 AWS SQS 队列时,是什么导致了这些 ParseError 异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20482331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 02:13:14  来源:igfitidea点击:

What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster

javaamazon-web-servicesamazon-sqsapache-stormjaxp

提问by Joel Rosenberg

I'm using Storm 0.8.1 to read incoming messages off an Amazon SQS queue and am getting consistent exceptions when doing so:

我正在使用 Storm 0.8.1 从 Amazon SQS 队列读取传入消息,并且在这样做时得到一致的异常:

2013-12-02 02:21:38 executor [ERROR] 
java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219)
        at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88)
        at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447)
        at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Thread.java:701)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
        at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812)
        at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191)
        ... 5 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503)
        ... 10 more

I've debugged the data on the queue and everything looks good. I can't figure out why the API's XML response would be causing these problems. Any ideas?

我已经调试了队列中的数据,一切看起来都很好。我不明白为什么 API 的 XML 响应会导致这些问题。有任何想法吗?

采纳答案by Joel Rosenberg

Answering my own question here for the ages.

多年来在这里回答我自己的问题。

There's currently an XML expansion limit processing bug in Oracle and OpenJDK's Java that results in a shared counter hitting the default upper bound when parsing multiple XML documents.

目前在 Oracle 和 OpenJDK 的 Java 中存在 XML 扩展限制处理错误,导致共享计数器在解析多个 XML 文档时达到默认上限。

  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/issues/123
  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/issues/123

Although I thought that our version (6b27-1.12.6-1ubuntu0.12.04.4) wasn't affected, running the sample code given in the OpenJDK bug report did indeed verify that we were susceptible to the bug.

虽然我认为我们的版本 (6b27-1.12.6-1ubuntu0.12.04.4) 没有受到影响,但运行 OpenJDK 错误报告中给出的示例代码确实验证了我们容易受到该错误的影响。

To work around the issue, I needed to pass jdk.xml.entityExpansionLimit=0to the Storm workers. By adding the following to storm.yamlacross my cluster, I was able to mitigate this problem.

为了解决这个问题,我需要传递jdk.xml.entityExpansionLimit=0给 Storm 工作人员。通过storm.yaml在我的集群中添加以下内容,我能够缓解这个问题。

supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0"
worker.childopts: "-Djdk.xml.entityExpansionLimit=0"

I should note that this technically opens you up to a Denial of Service attack, but since our XML documents are only coming from SQS, I'm not worried about someone forging malevolent XML to kill our workers.

我应该注意到,从技术上讲,这会让您面临拒绝服务攻击,但由于我们的 XML 文档仅来自 SQS,我不担心有人伪造恶意的 XML 来杀死我们的工人。