xml 验证失败:“EntityRef:期待‘;’”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3431280/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 13:14:11  来源:igfitidea点击:

Validation Failed: "EntityRef: expecting ';'"

xmlvalidationxml-validationcharacter-reference

提问by Laxmidi

Hi I've got some XML that won't validate. I've narrowed down the problem to this bit:

嗨,我有一些无法验证的 XML。我已经将问题缩小到这一点:

<script type="text/javascript">document.getelementbyid("oxm-1f4a4485-5a1d-45f9-a989-9c65a0b9ceb6").src="http://bid.website.net/display?l=h4siaaaaaaaaad2nmq6cqbrenycw7qjyolfccxmregvcoae0u0sly_agtvaewwn4bg_havwbnebpvmzkkzra_kzzdvoloq4u-hjnp7sii0rxcbzz5vl5kxsrds6wtsfbxmcr9chysuhqbecuckb8cvx4m-pbcxugtdrll6d3dqtihnqukth2yvdkptr67cuzfvlxjlinkul9634lpal_h4mwhso8aabzhw1cdcwjxl6xivgv8agrjxjc_gaaaa==&p=h4siaaaaaaaaabxkmq7cmaxaurcqjjrrsfqqsrm7x3fsrwyvosda8qnj_3ojfgb49o45pblq7e80syzjhopggso9wyzpcpntzkxk1ldtbbi7otmxfj9da1wpjcf10vtxdj9e5_utyj19k2lfssepld5agnqaaaa=&url=http%3a%2f%2flocalhost%2fproject-debug%2fproject.html";</script>

I put it in an XML validator and it spat out:

我把它放在一个 XML 验证器中,它吐出来了:

This page contains the following errors: error on line 1 at column 16: EntityRef: expecting ';'

此页面包含以下错误:第 1 行第 16 列错误:EntityRef:期望 ';'

Any ideas as to where the missing ';' is supposed to go? Is there another problem?

关于缺少“;”的任何想法 应该去?还有其他问题吗?

回答by John Kugelman

You have unescaped ampersands &in your URL. They either need to be (a) changed to character entities (&amp;), or (b) enclosed in a CDATA section.

&的 URL 中有未转义的 & 符号。它们要么需要 (a) 更改为字符实体 ( &amp;),要么 (b) 包含在 CDATA 部分中。

A CDATA section lets you leave special characters like &unescaped, so that'd be easiest:

CDATA 部分可让您保留特殊字符,&如未转义,这样最简单:

<script type="text/javascript">
// <![CDATA[
    document.getElementById(...).src="...";
// ]]>
</script>

You can include anything you want inside of a CDATA section aside from the exact character sequence ]]>. The //comments are there to make sure browsers that don't understand CDATA sections ignore the <![CDATA[and ]]>markers.

除了确切的字符序列之外,您可以在 CDATA 部分中包含任何您想要的内容]]>。这些//注释是为了确保不理解 CDATA 部分的浏览器忽略<![CDATA[]]>标记。

By the way, JavaScript is case sensitive. That should be getElementByIdnot getelementbyid.

顺便说一下,JavaScript 区分大小写。这应该是getElementById没有getelementbyid

回答by Dale Magee

modifying the content isn't always possible, e.g if you're scraping a website.

修改内容并不总是可能的,例如,如果您正在抓取网站。

you can't just str_replace '&' with '&amp;' because the html might include valid html entities, and you'd get something like "&amp;amp;"

你不能只是 str_replace '&' 与 '&' 因为 html 可能包含有效的 html 实体,您会得到类似“&amp;”的内容

Here's a regex which should replace ampersands with htmlentiries for ampersands, without breaking good htmlentities:

这是一个正则表达式,它应该用 htmlentiries 替换 & 符号,而不会破坏好的 htmlentities:

$html = preg_replace("|&([^;]+?)[\s<&]|","&amp; ",$html);

I used it to scrape about 700 pages without any problems :)

我用它刮了大约 700 页没有任何问题:)