javascript 将 UTF-8 BOM 添加到字符串/Blob

Question

提问by kay - SE is evil

I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that?

我需要在客户端为生成的文本数据添加一个 UTF-8 字节顺序标记。我怎么做？

Using new Blob(['\xEF\xBB\xBF' + content])yields '???"my data"', of course.

当然，使用new Blob(['\xEF\xBB\xBF' + content])yields '???"my data"'。

Neither did '\uBBEF\x22BF'work (with '\x22' == '"'being the next character in content).

也没有'\uBBEF\x22BF'工作（'\x22' == '"'作为中的下一个字符content）。

Is it possible to prepend the UTF-8 BOM in JavaScript to a generated text?

是否可以将 JavaScript 中的 UTF-8 BOM 添加到生成的文本中？

^{Yes, I really do need the UTF-8 BOM in this case.}

^{是的，在这种情况下，我确实需要 UTF-8 BOM。}

Answer 1

回答by Erik T?yr? Silfversw?rd

Prepend \ufeffto the string. See http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx

前置\ufeff到字符串。请参阅http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx

See discussion between @jeff-fischerand @casey for details on UTF-8 and UTF-16and the BOM. What actually makes the above work is that the string \ufeffis always used to represent the BOM, regardless of UTF-8 or UTF-16 being used.

有关UTF-8 和 UTF-16以及 BOM 的详细信息，请参阅@jeff-fischer和@casey之间的讨论。使上述工作真正起作用的是，无论使用的是 UTF-8 还是 UTF-16 ，字符串始终用于表示 BOM。\ufeff

See p.36 in The Unicode Standard 5.0, Chapter 2for a detailed explanation. A quote from that page

有关详细说明，请参阅The Unicode Standard 5.0, Chapter 2中的第 36 页。来自该页面的引用

The endian order entry for UTF-8 in Table 2-4 is marked N/A because UTF-8 code units are 8 bits in size, and the usual machine issues of endian order for larger code units do not apply. The serialized order of the bytes must not depart from the order defined by the UTF- 8 encoding form. Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature.

表 2-4 中 UTF-8 的字节序条目标记为 N/A，因为 UTF-8 代码单元的大小为 8 位，并且较大代码单元的字节序的常见机器问题不适用。字节的序列化顺序不得偏离 UTF-8 编码格式定义的顺序。对于 UTF-8，既不需要也不建议使用 BOM，但在 UTF-8 数据从使用 BOM 的其他编码形式转换或 BOM 用作 UTF-8 签名的情况下可能会遇到。

Answer 2

回答by carlosrafaelgn

I had the same issue and this is the solution I came up with:

我有同样的问题，这是我想出的解决方案：

var blob = new Blob([
                    new Uint8Array([0xEF, 0xBB, 0xBF]), // UTF-8 BOM
                    "Text",
                    ... // Remaining data
                    ],
                    { type: "text/plain;charset=utf-8" });

Using Uint8Arrayprevents the browser from converting those bytes into string (tested on Chrome and Firefox).

使用Uint8Array防止浏览器将这些字节转换为字符串（在 Chrome 和 Firefox 上测试）。

You should replace text/plainwith your desired MIME type.

您应该替换text/plain为您想要的 MIME 类型。

Answer 3

回答by Jeff Fischer

I'm editing my original answer. The above answer really demands elaboration as this is a convoluted solution by Node.js.

我正在编辑我的原始答案。上面的答案确实需要详细说明，因为这是 Node.js 的一个复杂的解决方案。

The short answer is, yes, this code works.

简短的回答是，是的，此代码有效。

The long answer is, no, FEFF is not the byte order mark for utf-8. Apparently node took some sort of shortcut for writing encodings within files. FEFF is the UTF16 Little Endian encoding as can be seen within the Byte Order Mark wikipedia article and can also be viewed within a binary text editor after having written the file. I've verified this is the case.

答案很长，不，FEFF 不是 utf-8 的字节顺序标记。显然，节点采用某种快捷方式在文件中写入编码。FEFF 是 UTF16 Little Endian 编码，可以在 Byte Order Mark 维基百科文章中看到，也可以在编写文件后在二进制文本编辑器中查看。我已经验证是这种情况。

http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

Apparently, Node.JS uses the \ufeff to signify any number of encoding. It takes the \ufeff marker and converts it into the correct byte order mark based on the 3rd options parameter of writeFile. The 3rd parameter you pass in the encoding string. Node.JS takes this encoding string and convertsthe \ufeff fixed byte encoding into any one of the actual encoding's byte order marks.

显然，Node.JS 使用 \ufeff 来表示任意数量的 encoding。它采用 \ufeff 标记并根据 writeFile 的第三个选项参数将其转换为正确的字节顺序标记。您在编码字符串中传递的第三个参数。Node.JS 使用此编码字符串并将\ufeff 固定字节编码转换为实际编码的任何一种字节顺序标记。

UTF-8 Example:

UTF-8 示例：

fs.writeFile(someFilename, '\ufeff' + html, { encoding: 'utf8' }, function(err) {
   /* The actual byte order mark written to the file is EF BB BF */
}

UTF-16 Little Endian Example:

UTF-16 小端示例：

fs.writeFile(someFilename, '\ufeff' + html, { encoding: 'utf16le' }, function(err) {
   /* The actual byte order mark written to the file is FF FE */
}

So, as you can see the \ufeff is simply a marker stating any number of resulting encodings. The actual encoding that makes it into the file is directly dependent the encoding option specified. The marker used within the string is really irrelevant to what gets written to the file.

所以，正如你所看到的，\ufeff 只是一个标记，说明任意数量的结果编码。使其进入文件的实际编码直接依赖于指定的编码选项。字符串中使用的标记与写入文件的内容实际上无关。

I suspect that the reasoning behind this is because they chose not to write byte order marks and the 3 byte mark for UTF-8 isn't easily encoded into the javascript string to be written to disk. So, they used the UTF16LE BOM as a placeholder mark within the string which gets substituted at write-time.

我怀疑这背后的原因是因为他们选择不写入字节顺序标记，并且 UTF-8 的 3 字节标记不容易编码到要写入磁盘的 javascript 字符串中。因此，他们使用 UTF16LE BOM 作为字符串中的占位符标记，在写入时被替换。

Answer 4

回答by Santy SC

This is my solution:

这是我的解决方案：

var blob = new Blob(["\uFEFF"+csv], {
type: 'text/csv; charset=utf-18'
});

javascript 将 UTF-8 BOM 添加到字符串/Blob

提问by kay - SE is evil

回答by Erik T?yr? Silfversw?rd

回答by carlosrafaelgn

回答by Jeff Fischer

回答by Santy SC

相关推荐

最近更新

标签

javascript 将 UTF-8 BOM 添加到字符串/Blob

提问by kay - SE is evil

回答by Erik T?yr? Silfversw?rd

回答by carlosrafaelgn

回答by Jeff Fischer

回答by Santy SC

相关推荐

javascript 使用javascript从字符串中提取数字

javascript moment.js - UTC 给出错误的日期

javascript 删除元素的父行

javascript 将光标设置到 textarea 中特定行的特定位置

相关推荐

最近更新

标签