Javascript HTML5 文件 API 读取为文本和二进制
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3146483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
HTML5 File API read as text and binary
提问by tcooc
I am currently working on the HTML5 File API, and I need to get binary file data.
The FileReader's readAsText, and readAsDataURLmethods work fine, but readAsBinaryStringreturns the same data as readAsText.
我目前正在研究 HTML5 文件 API,我需要获取二进制文件数据。
The FileReader的readAsText和readAsDataURL方法工作正常,但readAsBinaryString返回的数据与readAsText.
I need binary data, but im getting a text string. Am I missing something?
我需要二进制数据,但我得到了一个文本字符串。我错过了什么吗?
回答by T.J. Crowder
Note in 2018: readAsBinaryStringis outdated. For use cases where previously you'd have used it, these days you'd use readAsArrayBuffer(or in some cases, readAsDataURL) instead.
2018 年的注意事项:readAsBinaryString已过时。对于您以前使用过的用例,现在您将使用readAsArrayBuffer(或在某些情况下,readAsDataURL)代替。
readAsBinaryStringsays that the data must be represented as a binary string, where:
readAsBinaryString表示数据必须表示为二进制字符串,其中:
...every byte is represented by an integer in the range [0..255].
...每个字节都由 [0..255] 范围内的整数表示。
JavaScript originally didn't have a "binary" type (until ECMAScript 5's WebGL support of Typed Array* (details below)-- it has been superseded by ECMAScript 2015's ArrayBuffer) and so they went with a String with the guarantee that no character stored in the String would be outside the range 0..255. (They could have gone with an array of Numbers instead, but they didn't; perhaps large Strings are more memory-efficient than large arrays of Numbers, since Numbers are floating-point.)
JavaScript 最初没有“二进制”类型(直到 ECMAScript 5 的 WebGL 支持Typed Array* (详细信息如下)——它已被 ECMAScript 2015 的ArrayBuffer取代),因此它们使用 String 并保证不存储任何字符在字符串中将超出范围 0..255。(他们本可以使用数字数组代替,但他们没有;也许大字符串比大数字数组更节省内存,因为数字是浮点数。)
If you're reading a file that's mostly text in a western script (mostly English, for instance), then that string is going to look a lotlike text. If you read a file with Unicode characters in it, you should notice a difference, since JavaScript strings are UTF-16** (details below)and so some characters will have values above 255, whereas a "binary string" according to the File API spec wouldn't have any values above 255 (you'd have two individual "characters" for the two bytes of the Unicode code point).
如果你正在读这主要是在西部的脚本文本(主要是英语,例如)的文件,那么该字符串看起来会很多像文本。如果你读了一个包含 Unicode 字符的文件,你应该注意到一个区别,因为 JavaScript 字符串是UTF-16** (细节如下),所以一些字符的值会高于 255,而根据文件,“二进制字符串” API 规范不会有任何大于 255 的值(对于 Unicode 代码点的两个字节,您将有两个单独的“字符”)。
If you're reading a file that's not text at all (an image, perhaps), you'll probably still get a very similar result between readAsTextand readAsBinaryString, but with readAsBinaryStringyou knowthat there won't be any attempt to interpret multi-byte sequences as characters. You don't know that if you use readAsText, because readAsTextwill use an encoding determinationto try to figure out what the file's encoding is and then map it to JavaScript's UTF-16 strings.
如果您正在阅读一个根本不是文本的文件(也许是图像),您可能仍然会在readAsText和之间获得非常相似的结果readAsBinaryString,但是readAsBinaryString您知道不会尝试解释多字节序列作为字符。你不知道如果你使用readAsText, 因为readAsText将使用编码决定来尝试找出文件的编码是什么,然后将它映射到 JavaScript 的 UTF-16 字符串。
You can see the effect if you create a file and store it in something other than ASCII or UTF-8. (In Windows you can do this via Notepad; the "Save As" as an encoding drop-down with "Unicode" on it, by which looking at the data they seem to mean UTF-16; I'm sure Mac OS and *nix editors have a similar feature.) Here's a page that dumps the result of reading a file both ways:
如果您创建一个文件并将其存储在 ASCII 或 UTF-8 以外的其他格式中,您可以看到效果。(在 Windows 中,您可以通过记事本执行此操作;“另存为”作为带有“Unicode”的编码下拉列表,通过它查看数据,它们似乎意味着 UTF-16;我确定 Mac OS 和 * nix 编辑器有一个类似的功能。)这是一个页面,它转储两种方式读取文件的结果:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Show File Data</title>
<style type='text/css'>
body {
font-family: sans-serif;
}
</style>
<script type='text/javascript'>
function loadFile() {
var input, file, fr;
if (typeof window.FileReader !== 'function') {
bodyAppend("p", "The file API isn't supported on this browser yet.");
return;
}
input = document.getElementById('fileinput');
if (!input) {
bodyAppend("p", "Um, couldn't find the fileinput element.");
}
else if (!input.files) {
bodyAppend("p", "This browser doesn't seem to support the `files` property of file inputs.");
}
else if (!input.files[0]) {
bodyAppend("p", "Please select a file before clicking 'Load'");
}
else {
file = input.files[0];
fr = new FileReader();
fr.onload = receivedText;
fr.readAsText(file);
}
function receivedText() {
showResult(fr, "Text");
fr = new FileReader();
fr.onload = receivedBinary;
fr.readAsBinaryString(file);
}
function receivedBinary() {
showResult(fr, "Binary");
}
}
function showResult(fr, label) {
var markup, result, n, aByte, byteStr;
markup = [];
result = fr.result;
for (n = 0; n < result.length; ++n) {
aByte = result.charCodeAt(n);
byteStr = aByte.toString(16);
if (byteStr.length < 2) {
byteStr = "0" + byteStr;
}
markup.push(byteStr);
}
bodyAppend("p", label + " (" + result.length + "):");
bodyAppend("pre", markup.join(" "));
}
function bodyAppend(tagName, innerHTML) {
var elm;
elm = document.createElement(tagName);
elm.innerHTML = innerHTML;
document.body.appendChild(elm);
}
</script>
</head>
<body>
<form action='#' onsubmit="return false;">
<input type='file' id='fileinput'>
<input type='button' id='btnLoad' value='Load' onclick='loadFile();'>
</form>
</body>
</html>
If I use that with a "Testing 1 2 3" file stored in UTF-16, here are the results I get:
如果我将它与存储在 UTF-16 中的“Testing 1 2 3”文件一起使用,我得到的结果如下:
Text (13): 54 65 73 74 69 6e 67 20 31 20 32 20 33 Binary (28): ff fe 54 00 65 00 73 00 74 00 69 00 6e 00 67 00 20 00 31 00 20 00 32 00 20 00 33 00
As you can see, readAsTextinterpreted the characters and so I got 13 (the length of "Testing 1 2 3"), and readAsBinaryStringdidn't, and so I got 28 (the two-byte BOMplus two bytes for each character).
如您所见,readAsText解释了字符,所以我得到了 13(“测试 1 2 3”的长度),readAsBinaryString但没有,所以我得到了 28(两字节的BOM加上每个字符的两个字节)。
* XMLHttpRequest.responsewith responseType = "arraybuffer"is supported in HTML 5.
* XMLHttpRequest.response与responseType = "arraybuffer"在HTML 5的支持。
** "JavaScript strings are UTF-16"may seem like an odd statement; aren't they just Unicode? No, a JavaScript string is a series of UTF-16 code units; you see surrogate pairs as two individual JavaScript "characters" even though, in fact, the surrogate pair as a whole is just one character. See the link for details.
** “JavaScript 字符串是 UTF-16”可能看起来很奇怪;他们不只是Unicode吗?不,JavaScript 字符串是一系列 UTF-16 代码单元;您将代理对视为两个单独的 JavaScript“字符”,尽管实际上,代理对作为一个整体只是一个字符。有关详细信息,请参阅链接。

