如何从我的 xml 文件中删除 BOM 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/295472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 12:12:39  来源:igfitidea点击:

How do I remove the BOM character from my xml file

xmlxsltunicodebyte-order-mark

提问by Benedikt Waldvogel

I am using xsl to control the output of my xml file, but the BOM character is being added.

我使用 xsl 来控制我的 xml 文件的输出,但正在添加 BOM 字符。

回答by Benedikt Waldvogel

# vim file.xml
:set nobomb
:wq

回答by ken

just need to add this in your xslt file:

只需要在你的 xslt 文件中添加这个:

<xsl:output method="text"
        encoding="ASCII"/>

回答by dr_leevsey

Remove the BOM symbol from string with XSLT is pretty simple:

使用 XSLT 从字符串中删除 BOM 符号非常简单:

<xsl:value-of select="translate(StringWithBOM,'','')"/>

<xsl:value-of select="translate(StringWithBOM,'','')"/>

回答by Marko

Just strip first two bytes using any hex editor.

只需使用任何十六进制编辑器去除前两个字节即可。

回答by yfeldblum

I was under the impression that XML is encouraged to be written in Unicode, in some Unicode encoding, and that certain Unicode encodings are specified to contain an initial byte-order mark. Without that byte-order mark, your file is no longer correctly encoded in a Unicode encoding and therefore no longer correct XML. XML processors are encouraged to be unforgiving, to fail immediately on the slightest error (such as an incorrect Unicode encoding). What kinds of XML processors are you looking to break?

我的印象是鼓励使用 Unicode 编写 XML,使用某些 Unicode 编码,并且指定某些 Unicode 编码包含初始字节顺序标记。如果没有该字节顺序标记,您的文件将不再以 Unicode 编码正确编码,因此不再正确的 XML。鼓励 XML 处理器无情,在出现最轻微的错误(例如不正确的 Unicode 编码)时立即失败。您希望破解哪些类型的 XML 处理器?

Obviously, stripping a byte-order mark from a UTF-8 encoded document makes that document appear to be ASCII encoded (not Unicode), and some text processors are capable only of using ASCII encoded documents. Is this what you're working with?

显然,从 UTF-8 编码的文档中去除字节顺序标记会使该文档看起来是 ASCII 编码的(而不是 Unicode),并且某些文本处理器只能使用 ASCII 编码的文档。这是你正在使用的吗?

回答by AmbroseChapel

What output encoding is your XSL set to use? What encoding is the input document? Where's the input coming from, and where was it saved/uploaded/dowloaded in the meantime?

您的 XSL 设置使用什么输出编码?输入文档是什么编码?输入来自哪里,同时它在哪里保存/上传/下载?

XML and XSL should defaultto using UTF-8 if nothing else is specified. But clearly, something's going wrong here.

如果没有指定其他内容,XML 和 XSL 应默认使用 UTF-8。但很明显,这里出了点问题。

One thing which might happen is, the XML is being served up by a web server which is set by default to serve in ISO-8859-1, a pretty good default ... pre-Unicode.

可能发生的一件事是,XML 由 Web 服务器提供,该 Web 服务器默认设置为在 ISO-8859-1 中提供服务,这是一个非常好的默认设置...pre-Unicode。

Slightly off-topic, but Joel's very instructive articleabout text encodings was an eye-opener to me. There are a lot of people out there who are otherwise very smart about programming, but who persist in thinking there's such a thing as "plain text" or calling their text "ASCII" or "ANSI". It's an issue you really need to get to grips with if you haven't yet.

有点跑题,但乔尔关于文本编码的非常有启发性的文章让我大开眼界。有很多人在编程方面非常聪明,但他们坚持认为存在“纯文本”或称他们的文本为“ASCII”或“ANSI”这样的东西。如果你还没有解决这个问题,你真的需要解决这个问题。