javascript 将 UTF-8 BOM 添加到字符串/Blob
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17879198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding UTF-8 BOM to string/Blob
提问by kay - SE is evil
I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that?
我需要在客户端为生成的文本数据添加一个 UTF-8 字节顺序标记。我怎么做?
Using new Blob(['\xEF\xBB\xBF' + content])
yields '???"my data"'
, of course.
当然,使用new Blob(['\xEF\xBB\xBF' + content])
yields '???"my data"'
。
Neither did '\uBBEF\x22BF'
work (with '\x22' == '"'
being the next character in content
).
也没有'\uBBEF\x22BF'
工作('\x22' == '"'
作为 中的下一个字符content
)。
Is it possible to prepend the UTF-8 BOM in JavaScript to a generated text?
是否可以将 JavaScript 中的 UTF-8 BOM 添加到生成的文本中?
Yes, I really do need the UTF-8 BOM in this case.
是的,在这种情况下,我确实需要 UTF-8 BOM。
回答by Erik T?yr? Silfversw?rd
Prepend \ufeff
to the string. See http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx
前置\ufeff
到字符串。请参阅http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx
See discussion between @jeff-fischerand @caseyfor details on UTF-8 and UTF-16and the BOM. What actually makes the above work is that the string \ufeff
is always used to represent the BOM, regardless of UTF-8 or UTF-16 being used.
有关UTF-8 和 UTF-16以及 BOM 的详细信息,请参阅@jeff-fischer和@casey之间的讨论。使上述工作真正起作用的是,无论使用的是 UTF-8 还是 UTF-16 ,字符串始终用于表示 BOM。\ufeff
See p.36 in The Unicode Standard 5.0, Chapter 2for a detailed explanation. A quote from that page
有关详细说明,请参阅The Unicode Standard 5.0, Chapter 2中的第 36 页。来自该页面的引用
The endian order entry for UTF-8 in Table 2-4 is marked N/A because UTF-8 code units are 8 bits in size, and the usual machine issues of endian order for larger code units do not apply. The serialized order of the bytes must not depart from the order defined by the UTF- 8 encoding form. Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature.
表 2-4 中 UTF-8 的字节序条目标记为 N/A,因为 UTF-8 代码单元的大小为 8 位,并且较大代码单元的字节序的常见机器问题不适用。字节的序列化顺序不得偏离 UTF-8 编码格式定义的顺序。对于 UTF-8,既不需要也不建议使用 BOM,但在 UTF-8 数据从使用 BOM 的其他编码形式转换或 BOM 用作 UTF-8 签名的情况下可能会遇到。
回答by carlosrafaelgn
I had the same issue and this is the solution I came up with:
我有同样的问题,这是我想出的解决方案:
var blob = new Blob([
new Uint8Array([0xEF, 0xBB, 0xBF]), // UTF-8 BOM
"Text",
... // Remaining data
],
{ type: "text/plain;charset=utf-8" });
Using Uint8Array
prevents the browser from converting those bytes into string (tested on Chrome and Firefox).
使用Uint8Array
防止浏览器将这些字节转换为字符串(在 Chrome 和 Firefox 上测试)。
You should replace text/plain
with your desired MIME type.
您应该替换text/plain
为您想要的 MIME 类型。
回答by Jeff Fischer
I'm editing my original answer. The above answer really demands elaboration as this is a convoluted solution by Node.js.
我正在编辑我的原始答案。上面的答案确实需要详细说明,因为这是 Node.js 的一个复杂的解决方案。
The short answer is, yes, this code works.
简短的回答是,是的,此代码有效。
The long answer is, no, FEFF is not the byte order mark for utf-8. Apparently node took some sort of shortcut for writing encodings within files. FEFF is the UTF16 Little Endian encoding as can be seen within the Byte Order Mark wikipedia article and can also be viewed within a binary text editor after having written the file. I've verified this is the case.
答案很长,不,FEFF 不是 utf-8 的字节顺序标记。显然,节点采用某种快捷方式在文件中写入编码。FEFF 是 UTF16 Little Endian 编码,可以在 Byte Order Mark 维基百科文章中看到,也可以在编写文件后在二进制文本编辑器中查看。我已经验证是这种情况。
http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
Apparently, Node.JS uses the \ufeff to signify any number of encoding. It takes the \ufeff marker and converts it into the correct byte order mark based on the 3rd options parameter of writeFile. The 3rd parameter you pass in the encoding string. Node.JS takes this encoding string and convertsthe \ufeff fixed byte encoding into any one of the actual encoding's byte order marks.
显然,Node.JS 使用 \ufeff 来表示任意数量的 encoding。它采用 \ufeff 标记并根据 writeFile 的第三个选项参数将其转换为正确的字节顺序标记。您在编码字符串中传递的第三个参数。Node.JS 使用此编码字符串并将\ufeff 固定字节编码转换为实际编码的任何一种字节顺序标记。
UTF-8 Example:
UTF-8 示例:
fs.writeFile(someFilename, '\ufeff' + html, { encoding: 'utf8' }, function(err) {
/* The actual byte order mark written to the file is EF BB BF */
}
UTF-16 Little Endian Example:
UTF-16 小端示例:
fs.writeFile(someFilename, '\ufeff' + html, { encoding: 'utf16le' }, function(err) {
/* The actual byte order mark written to the file is FF FE */
}
So, as you can see the \ufeff is simply a marker stating any number of resulting encodings. The actual encoding that makes it into the file is directly dependent the encoding option specified. The marker used within the string is really irrelevant to what gets written to the file.
所以,正如你所看到的,\ufeff 只是一个标记,说明任意数量的结果编码。使其进入文件的实际编码直接依赖于指定的编码选项。字符串中使用的标记与写入文件的内容实际上无关。
I suspect that the reasoning behind this is because they chose not to write byte order marks and the 3 byte mark for UTF-8 isn't easily encoded into the javascript string to be written to disk. So, they used the UTF16LE BOM as a placeholder mark within the string which gets substituted at write-time.
我怀疑这背后的原因是因为他们选择不写入字节顺序标记,并且 UTF-8 的 3 字节标记不容易编码到要写入磁盘的 javascript 字符串中。因此,他们使用 UTF16LE BOM 作为字符串中的占位符标记,在写入时被替换。
回答by Santy SC
This is my solution:
这是我的解决方案:
var blob = new Blob(["\uFEFF"+csv], {
type: 'text/csv; charset=utf-18'
});