Java 在 GAE 上解析完全有效的 XML 时,“序言中不允许有内容”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3030903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"Content is not allowed in prolog" when parsing perfectly valid XML on GAE
提问by Adrian Petrescu
I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.
在过去的 48 小时里,我一直在反对这个绝对令人恼火的错误,所以我想我最终会认输并尝试在我把笔记本电脑扔出窗外之前在这里问一下。
I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:
我正在尝试从我对 AWS SimpleDB 进行的调用中解析响应 XML。响应在线路上返回就好了;例如,它可能看起来像:
<?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ListDomainsResult>
<DomainName>Audio</DomainName>
<DomainName>Course</DomainName>
<DomainName>DocumentContents</DomainName>
<DomainName>LectureSet</DomainName>
<DomainName>MetaData</DomainName>
<DomainName>Professors</DomainName>
<DomainName>Tag</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
<BoxUsage>0.0000071759</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
I pass in this XML to a parser with
我将此 XML 传递给解析器
XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
and call eventReader.nextEvent();
a bunch of times to get the data I want.
并eventReader.nextEvent();
多次调用以获取我想要的数据。
Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:
这是奇怪的部分——它在本地服务器内工作得很好。回复来了,我解析一下,大家开心就好。问题是,当我将代码部署到 Google App Engine 时,传出请求仍然有效,响应 XML 对我来说似乎 100% 相同且正确,但响应无法解析,出现以下异常:
com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
... (rest of lines omitted)
I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.
我对这个 XML 进行了两次、三次、四次检查,以查找“不可见字符”或非 UTF8 编码字符等。我在数组中逐字节查看它以获取字节顺序标记或类似性质的东西。没有; 它通过了我可以投入的所有验证测试。更奇怪的是,如果我也使用基于 Saxon 的解析器,就会发生这种情况——但仅在 GAE 上,它在我的本地环境中始终可以正常工作。
It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:
当我只能在完美运行的环境中运行调试器时,很难跟踪问题的代码(我还没有找到任何在 GAE 上远程调试的好方法)。尽管如此,使用我拥有的原始方法,我已经尝试了一百万种方法,包括:
- XML with and without the prolog
- With and without newlines
- With and without the "encoding=" attribute in the prolog
- Both newline styles
- With and without the chunking information present in the HTTP stream
- 带有和不带有序言的 XML
- 有和没有换行符
- 在序言中有和没有“编码=”属性
- 两种换行样式
- 有和没有 HTTP 流中存在的分块信息
And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?
我已经在多种组合中尝试了其中的大部分,在这些组合中它们会相互作用 - 什么都没有!我已经无计可施了。有没有人以前见过这样的问题,希望能对此有所了解?
Thanks!
谢谢!
采纳答案by Romain Hippeau
The encoding in your XML and XSD (or DTD) are different.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>
XML 和 XSD(或 DTD)中的编码是不同的。
XML 文件头: <?xml version='1.0' encoding='utf-8'?>
XSD 文件头:<?xml version='1.0' encoding='utf-16'?>
Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:
另一种可能导致这种情况的情况是在 XML 文档类型声明之前出现任何内容。即你可能在缓冲区中有这样的东西:
helloworld<?xml version="1.0" encoding="utf-8"?>
or even a space or special character.
甚至是空格或特殊字符。
There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this...
缓冲区中可能有一些称为字节顺序标记的特殊字符。在将缓冲区传递给解析器之前,请执行以下操作...
String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\W]+)<","<");
回答by Sunmit Girme
This error message is always caused by the invalid XML content in the beginning element. For example, extra small dot “.” in the beginning of XML element.
此错误消息始终是由开头元素中的无效 XML 内容引起的。例如,额外的小点“.” 在 XML 元素的开头。
Any characters before the “<?xml….
” will cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” error message.
“ <?xml….
”之前的任何字符都会导致上面的“ org.xml.sax.SAXParseException: Content is not allowed in prolog”错误消息。
A small dot “.” before the “<?xml….
一个小点” 。“ 之前“<?xml….
To fix it, just delete all those weird characters before the “<?xml“
.
要修复它,只需删除“<?xml“
.
Ref: http://www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/
参考:http: //www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/
回答by SoloPilot
I had a tab character instead of spaces. Replacing the tab '\t' fixed the problem.
我有一个制表符而不是空格。替换选项卡 '\t' 解决了这个问题。
Cut and paste the whole doc into an editor like Notepad++ and display all characters.
将整个文档剪切并粘贴到 Notepad++ 等编辑器中并显示所有字符。
回答by Saturn CAU
I was facing the same issue. In my case XML files were generated from c# program and feeded into AS400 for further processing. After some analysis identified that I was using UTF8 encoding while generating XML files whereas javac(in AS400) uses "UTF8 without BOM". So, had to write extra code similar to mentioned below:
我面临着同样的问题。在我的例子中,XML 文件是从 c# 程序生成的,并输入 AS400 进行进一步处理。经过一些分析发现我在生成 XML 文件时使用了 UTF8 编码,而 javac(在 AS400 中)使用“没有 BOM 的 UTF8”。因此,必须编写类似于下面提到的额外代码:
//create encoding with no BOM
Encoding outputEnc = new UTF8Encoding(false);
//open file with encoding
TextWriter file = new StreamWriter(filePath, false, outputEnc);
file.Write(doc.InnerXml);
file.Flush();
file.Close(); // save and close it
回答by Ravi Kiran
I was facing the same problem called "Content is not allowed in prolog" in my xml file.
我在我的 xml 文件中遇到了同样的问题,称为“序言中不允许内容”。
Solution
解决方案
Initially my root folder was '#Filename'.
最初我的根文件夹是 '# Filename'。
When i removed the first character '#' ,the error got resolved.
当我删除第一个字符 '#' 时,错误得到解决。
No need of removing the #filename... Try in this way..
不需要删除#filename...试试这种方式..
Instead of passing a File or URL object to the unmarshaller method, use a FileInputStream.
不要将 File 或 URL 对象传递给 unmarshaller 方法,而是使用 FileInputStream。
File myFile = new File("........");
Object obj = unmarshaller.unmarshal(new FileInputStream(myFile));
回答by dfritch
In my xml file, the header looked like this:
在我的 xml 文件中,标题如下所示:
<?xml version="1.0" encoding="utf-16"? />
In a test file, I was reading the file bytes and decoding the data as UTF-8 (not realizing the header in this file was utf-16) to create a string.
在一个测试文件中,我正在读取文件字节并将数据解码为 UTF-8(没有意识到该文件中的标头是 utf-16)以创建一个字符串。
byte[] data = Files.readAllBytes(Paths.get(path));
String dataString = new String(data, "UTF-8");
When I tried to deserialize this string into an object, I was seeing the same error:
当我尝试将此字符串反序列化为一个对象时,我看到了同样的错误:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
When I updated the second line to
当我将第二行更新为
String dataString = new String(data, "UTF-16");
I was able to deserialize the object just fine. So as Romain had noted above, the encodings need to match.
我能够很好地反序列化对象。所以正如 Romain 上面提到的,编码需要匹配。
回答by MBaas
In my instance of the problem, the solution was to replace german umlauts (??ü) with their HTML-equivalents...
在我的问题实例中,解决方案是用它们的 HTML 等效项替换德国变音符号 (??ü)...
回答by Avinash Dubey
bellow are cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” exception.
波纹管是上述“org.xml.sax.SAXParseException: Content is not allowed in prolog”异常的原因。
- First check the file path of schema.xsd and file.xml.
- The encoding in your XML and XSD (or DTD) should be same.
XML file header:<?xml version='1.0' encoding='utf-8'?>
XSD file header:<?xml version='1.0' encoding='utf-8'?>
- if anything comes before the XML document type declaration.i.e:
hello<?xml version='1.0' encoding='utf-16'?>
- 首先检查schema.xsd和file.xml的文件路径。
- XML 和 XSD(或 DTD)中的编码应该相同。
XML 文件头:<?xml version='1.0' encoding='utf-8'?>
XSD 文件头:<?xml version='1.0' encoding='utf-8'?>
- 如果在 XML 文档类型声明之前出现任何内容。即:
hello<?xml version='1.0' encoding='utf-16'?>
回答by Tamias
In the spirit of "just delete all those weird characters before the <?xml", here's my Java code, which works well with input via a BufferedReader:
本着“只需删除 <?xml 之前所有那些奇怪的字符”的精神,这是我的 Java 代码,它适用于通过 BufferedReader 输入:
BufferedReader test = new BufferedReader(new InputStreamReader(fisTest));
test.mark(4);
while (true) {
int earlyChar = test.read();
System.out.println(earlyChar);
if (earlyChar == 60) {
test.reset();
break;
} else {
test.mark(4);
}
}
FWIW, the bytes I was seeing are (in decimal): 239, 187, 191.
FWIW,我看到的字节是(十进制):239、187、191。
回答by F.O.O
Removing the xml declaration solved it
删除xml声明解决了它
<?xml version='1.0' encoding='utf-8'?>