javascript 在 node.js 应用程序中读取文件时出现奇怪的 unicode 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14403377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 21:32:41  来源:igfitidea点击:

Strange unicode characters when reading in file in node.js app

javascriptnode.jsunicodeutf-16utf

提问by d512

I am attempting to write a node app that reads in a set of files, splits them into lines, and puts the lines into an array. Pretty simple. It works on quite a few files except some SQL files that I am working with. For some reason I seem to be getting some kind of unicode output when I split the lines up. The app looks something like this:

我正在尝试编写一个节点应用程序,它读入一组文件,将它们分成几行,然后将这些行放入一个数组中。很简单。除了我正在使用的一些 SQL 文件之外,它适用于相当多的文件。出于某种原因,当我拆分行时,我似乎得到了某种 unicode 输出。该应用程序看起来像这样:

fs = require("fs");
var data = fs.readFileSync("test.sql", "utf8");
console.log(data);
lines = data.split("\n");
console.log(lines);

The input file looks something like this:

输入文件如下所示:

use whatever
go

The output looks like this:

输出如下所示:

??use whatever
go

[ '??u\u0000s\u0000e\u0000 \u0000w\u0000h\u0000a\u0000t\u0000e\u0000v\u0000e\u0000r\u0000',
  '\u0000g\u0000o\u0000',
  '\u0000' ]

As you can see there is some kind of unrecognized character at the beginning of the file. After reading the data in and directly outputting it, it looks okay except for this character. However, if I then attempt to split it up into lines, I get all these unicode-like characters. Basically it's all the actual characters with "\u0000" at the beginning of each one.

如您所见,文件开头有某种无法识别的字符。读入数据直接输出后,除了这个字符外,看起来还可以。但是,如果我随后尝试将其分成几行,则会得到所有这些类似 unicode 的字符。基本上,它是每个字符开头带有“\u0000”的所有实际字符。

I have no idea what's going on here but it appears to have something to do with the characters in the file itself. If I copy and paste the text of the file into another new file and run the app on the new file, it works fine. I assume that whatever is causing this issue is being stripped out during the copy and paste process.

我不知道这里发生了什么,但它似乎与文件本身的字符有关。如果我将文件的文本复制并粘贴到另一个新文件中并在新文件上运行该应用程序,则它可以正常工作。我认为在复制和粘贴过程中会删除导致此问题的任何原因。

回答by Esailija

Your file is in UTF-16 Little BigEndian, not UTF-8.

你的文件是 UTF-16 小 大的字节序,而不是 UTF-8。

var data = fs.readFileSync("test.sql", "utf16le"); //Not sure if this eats the BOM


Unfortunately node.js only supports UTF-16 Little Endian or UTF-16LE (Can't be sure from reading docs, there is a slight difference between them; namely that UTF-16LE does not use BOMs), so you have to use iconvor convert the file to UTF-8 some other way.

不幸的是,node.js 仅支持 UTF-16 Little Endian 或 UTF-16LE(无法通过阅读文档确定,它们之间存在细微差别;即 UTF-16LE 不使用 BOM),因此您必须使用iconv或以其他方式将文件转换为 UTF-8。

Example:

例子:

var Iconv  = require('iconv').Iconv,
    fs = require("fs");

var buffer = fs.readFileSync("test.sql"),
    iconv = new Iconv( "UTF-16", "UTF-8");

var result = iconv.convert(buffer).toString("utf8");

回答by Chong Lip Phang

I did the following in Windows command prompt to convert the endianness:

我在 Windows 命令提示符下执行以下操作来转换字节序:

type file.txt > file2.txt

回答by Halcyon

Is this perhaps the BOM(Byte-Order-Mark)? Make sure you save your files without the BOMor include code to strip the BOM.

这可能是BOM(字节顺序标记)吗?确保您保存的文件没有BOM或包含代码以去除BOM.

The BOMis usually invisible in text editors.

BOM通常的文本编辑器不可见。

I know Notepad++ has a feature where you can easily strip a BOMfrom a file. Encoding > Encode in UTF-8 without BOM.

我知道 Notepad++ 有一个功能,您可以轻松地BOM从文件中删除 a 。Encoding > Encode in UTF-8 without BOM.

回答by Vikas

Use the lite version of Iconv-lite

使用精简版的Iconv-lite

var result= "";
var iconv = require('iconv-lite');
var stream = fs.createReadStream(sourcefile)
    .on("error",function(err){
        //handle error
    })
    .pipe(iconv.decodeStream('win1251'))
    .on("error",function(err){
        //handle error
    })
    .on("data",function(data){
        result += data;
    })
    .on("end",function(){
       //use result
    });