Python UnicodeError: UTF-16 流不以 BOM 开头

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49371931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:04:21  来源:igfitidea点击:

UnicodeError: UTF-16 stream does not start with BOM

pythoncsverror-handling

提问by Py11

I have trouble reading the csv file by python. My csv file has Korean and numbers.

我无法通过 python 读取 csv 文件。我的 csv 文件有韩文和数字。

Below is my python code.

下面是我的python代码。

import csv
import codecs
csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16'))
for row in csvreader:
    print(row)

First, there was a UnicodeDecodeError when I enter "for row in csvreader" line in the above code.

首先,当我在上面的代码中输入“for row in csvreader”行时,出现了 UnicodeDecodeError。

So I used the code below then the problem seemed to be solved

所以我使用了下面的代码然后问题似乎解决了

csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16'))

Then I ran into NULL byte error. Then I can't figure out what's wrong with the csv file.

然后我遇到了 NULL 字节错误。然后我无法弄清楚 csv 文件有什么问题。

[update] I don't think I changed anything from the previous code but my program shows "UnicodeError: UTF-16 stream does not start with BOM"

[更新] 我认为我没有对之前的代码进行任何更改,但我的程序显示“UnicodeError: UTF-16 流不以 BOM 开头”

When I open the csv by excel I can see the table in proper format (image attached at the botton) but when I open it in sublime Text, below is a snippet of what I get.

当我通过 excel 打开 csv 时,我可以看到正确格式的表格(图像附在底部)但是当我在 sublime Text 中打开它时,下面是我得到的片段。

504b 0304 1400 0600 0800 0000 2100 6322
f979 7701 0000 d405 0000 1300 0802 5b43
6f6e 7465 6e74 5f54 7970 6573 5d2e 786d
6c20 a204 0228 a000 0200 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000

If you need more information about my file, let me know!

如果您需要有关我的文件的更多信息,请告诉我!

I appreciate your help. Thanks in advance :)

我感谢您的帮助。提前致谢 :)

csv file shown in excel

在excel中显示的csv文件

enter image description here

在此处输入图片说明

csv file shown in sublime text enter image description here

以崇高文本显示的 csv 文件 在此处输入图片说明

采纳答案by abarnert

Now that you've included more of the file in your question, that isn't a CSV file at all. My guess is that it's an old-style binary XLS file, but that's just a guess. If you're just renaming spam.xls to spam.csv, you can't do that; you need to export it to CSV format. (If you need help with that, ask on another site that offers help with Excel instead of with programming.)

既然您在问题中包含了更多文件,那根本就不是 CSV 文件。我的猜测是它是一个旧式的二进制 XLS 文件,但这只是一个猜测。如果您只是将 spam.xls 重命名为 spam.csv,则不能这样做;您需要将其导出为 CSV 格式。(如果您需要这方面的帮助,请在另一个提供 Excel 帮助而不是编程帮助的网站上询问。)

If you can't do that for some reason, there are libraries on PyPI to parse XLS files—but if you wanted CSV, and you can export CSV, that's a better idea.

如果由于某种原因你不能这样做,PyPI 上有一些库可以解析 XLS 文件——但如果你想要 CSV,并且你可以导出 CSV,那是一个更好的主意。

回答by abarnert

The problem is that your input file apparently doesn't start with a BOM (a special character that gets recognizably encoded differently for little-endian vs. big-endian utf-16), so you can't just use “utf-16” as the encoding, you have to explicitly use “utf-16-le” or “utf-16-be”.

问题是您的输入文件显然不是以 BOM 开头(一种特殊字符,对于 little-endian 和 big-endian utf-16,其编码方式不同),因此您不能只使用“utf-16”作为编码,您必须明确使用“ utf-16-le”或“ utf-16-be”。

If you don't do that, codecswill guess, and if it guesses wrong, it'll try to read each code point backward and get illegal values.

如果你不这样做,它codecs会猜测,如果它猜错了,它会尝试向后读取每个代码点并获取非法值。

If your posted sample starts at an even offset and contains a bunch of ASCII, it's little-ending, so use the -le version. (But of course it's better to look at what it actually is than to guess.)

如果您发布的示例以偶数偏移量开始并包含一堆 ASCII,则它几乎没有结尾,因此请使用 -le 版本。(但当然,最好看看它的实际情况,而不是猜测。)

回答by Tom Blodget

The file begins with a PKZIP signatureso it is actually an XLSX file.

该文件以PKZIP 签名开头,因此它实际上是一个 XLSX 文件。

This is great because instead of a CSV file, where you would have to know the character encoding, headers, column types, delimiter, text quoting and escape rules, and line endings, you can just open it and programs can see the structure of the data.

这很棒,因为您不必在 CSV 文件中知道字符编码、标题、列类型、分隔符、文本引用和转义规则以及行尾,而只需打开它,程序就可以看到该文件的结构。数据。