PHP - SimpleXML 解析错误

Question

提问by JohnAllen

SEE EDITS AT BOTTOM TO SHOW MORE ACCURATE ERROR OUTPUT

查看底部的编辑以显示更准确的错误输出

I'm parsing somewhat large (~15MB) XML files with PHP for the first time using SimpleXML. The files are flight search results so they have long attributes (links back to Kayak; example:
"/book/flightcode=1238917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid=26-Vu01v7ilzhSAjPVLZ3Ul"

我第一次使用 SimpleXML 用 PHP 解析有点大（~15MB）的 XML 文件。这些文件是航班搜索结果，使他们早已属性（链接回到独木舟;例如：
“/book/flightcode=1238917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052& SID= 26 Vu01v7ilzhSAjPVLZ3Ul”

SimpleXML throws this error when parsing:

SimpleXML 解析时抛出此错误：

"Entity: line 10: parser error : EntityRef: expecting ';' in" and then;

“实体：第 10 行：解析器错误：EntityRef：期待 ';' 在”然后；

"38917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sidin" and then;

“38917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052& sid”然后；

"simplexml_load_string() [function.simplexml-load-string]: ^ in,"

“simplexml_load_string() [function.simplexml-load-string]: ^ in,”

and so forth for each line where there are these urls.

对于有这些 url 的每一行，依此类推。

I found a mention of SimpleXML not liking long attributes on php.net with no solution. I would rather just use and learn SimpleXML for now and work past this error if there is a non-janky, somewhat easy workaround.

我发现在 php.net 上提到 SimpleXML 不喜欢长属性，但没有解决方案。我宁愿现在只使用和学习 SimpleXML，如果有一个不卡顿的、有点简单的解决方法，就可以解决这个错误。

Does anyone have a solution? Thanks in advance!

有没有人有办法解决吗？提前致谢！

I tried entering the first 13 lines of the XML but it only outputs the info without the XML so.... I can do that if it will help. I'm not sure if using another parser/extension would reduce the functionality or ease of use but please feel free to suggest another if there's not workaround (DOM or XMLReader is what I'm thinking perhaps).

我尝试输入 XML 的前 13 行，但它只输出没有 XML 的信息，所以......如果有帮助，我可以这样做。我不确定使用另一个解析器/扩展是否会降低功能或易用性，但如果没有解决方法，请随时提出另一个建议（DOM 或 XMLReader 可能是我正在考虑的）。

EDITS BELOW TO INCLUDE LESS ADULTERATED ERROR OUTPUT:

下面的编辑包括更少的错误输出：

http://dl.dropbox.com/u/10206237/stack_overflow_xml.xml

ERROR 1:

错误 1：

simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 10: parser error : EntityRef: expecting ';' in

ERROR 2:(The XML I think is fine because it works with a Python script using DOM; I'm translating it to PHP because I don't know Python). I didn't know that the output in the browser would be different. Thanks for being patient.)

错误 2:(我认为 XML 很好，因为它可以与使用 DOM 的 Python 脚本一起使用；我将它翻译成 PHP，因为我不懂 Python）。我不知道浏览器中的输出会有所不同。谢谢你的耐心。）

<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: 38917408.Pt8rW8.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&amp;_sid_ in

ERROR 3:

错误 3：

function.simplexml-load-string</a>]:                                                                                ^ in

(all of those spaces are in there)

（所有这些空间都在那里）

Answer 1

回答by Josh Davis

As mentionned in other answers and comments, your source XML is brokenand XML parsers are supposed to reject invalid input. libxml has a "recover" mode which would let you load this broken XML, but you would lose the "&sid" part so it wouldn't help.

正如其他答案和评论中所提到的，您的源 XML 已损坏，并且 XML 解析器应该拒绝无效输入。libxml 有一个“恢复”模式，它可以让你加载这个损坏的 XML，但你会丢失“&sid”部分，所以它无济于事。

If you're lucky and you like taking chances, you can try to somehow make it work by kind-of-fixing the input. You can use some string replacement to escape the ampersands that look like they're in the query part of an URL.

如果您很幸运并且喜欢冒险，您可以尝试通过某种方式修复输入来使其工作。您可以使用一些字符串替换来转义看起来像是在 URL 的查询部分中的＆符号。

$xml = file_get_contents('broken.xml');
// replace '&' followed by a bunch of letters, numbers
// and underscores and an equal sign with &amp;
$xml = preg_replace('#&(?=[a-z_0-9]+=)#', '&amp;', $xml);
$sxe = simplexml_load_string($xml);

This is, of course, nothing but a hackand the only good way to fix your situation is to ask your XML provider to fix their generator. Because if it generates broken XML, who knows what other errors slip by unnoticed?

当然，这只不过是一种黑客行为，解决您的情况的唯一好方法是要求您的 XML 提供者修复他们的生成器。因为如果它生成损坏的 XML，谁知道还有哪些其他错误会被忽视？

Answer 2

回答by Jeremy

Darryl has the right answer as to why this is happening in his comment above. One way of fixing this would be to do a str_replace() to replace all '&' ampersands with '&' in the XML. According to the PHP manualyou could also use this regular expression to replace ampersands with their entities:

Darryl 在上面的评论中对为什么会发生这种情况给出了正确的答案。解决此问题的一种方法是执行 str_replace() 将所有 '&' ＆符号替换为 '&' 在 XML 中。根据PHP 手册，您还可以使用此正则表达式将＆符号替换为其实体：

$s = preg_replace('/&[^; ]{0,6}.?/e', "((substr('\0',-1) == ';') ? '\0' : '&amp;'.substr('\0',1))",

Answer 3

回答by Markus Zeller

Maybe the parsed xml file may be too big for the parser. But you can try to pass LIBXML_PARSEHUGE as an option - which helped in my case.

也许解析的 xml 文件对于解析器来说可能太大了。但是您可以尝试将 LIBXML_PARSEHUGE 作为选项传递 - 这对我的情况有所帮助。

Answer 4

回答by tony gil

I had this problem with 13MB files and solved it by including LIBXML_PARSEHUGEparameter:

我遇到了 13MB 文件的这个问题，并通过包含LIBXML_PARSEHUGE参数解决了这个问题：

$xml = new SimpleXMLElement($contents, LIBXML_PARSEHUGE);

NOTE: using ini_setat 1GB didnt solve my problem because PARSED contents occupied more than this.

注意：使用ini_set1GB 并没有解决我的问题，因为 PARSED 内容占用的空间不止于此。

A more radical approach is using other libraries to STREAM rather than LOAD WHOLE FILE (SAX parser versus DOM parser), like XML Streamer

更激进的方法是使用其他库进行 STREAM 而不是 LOAD WHOLE FILE（SAX 解析器与 DOM 解析器），例如XML Streamer

PHP - SimpleXML 解析错误

提问by JohnAllen

回答by Josh Davis

回答by Jeremy

回答by Markus Zeller

回答by tony gil

相关推荐

最近更新

标签

PHP - SimpleXML 解析错误

提问by JohnAllen

回答by Josh Davis

回答by Jeremy

回答by Markus Zeller

回答by tony gil

相关推荐

php 使用php preg_match（正则表达式）将camelCase单词拆分为单词

php 警告：缺少 1 个参数

php 从文件中读取第一行的最快方法

php 无法发送会话 cookie - 标头已发送

相关推荐

最近更新

标签