Python CSV 错误:行包含 NULL 字节
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4166070/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python CSV error: line contains NULL byte
提问by AP257
I'm working with some CSV files, with the following code:
我正在处理一些 CSV 文件,代码如下:
reader = csv.reader(open(filepath, "rU"))
try:
for row in reader:
print 'Row read successfully!', row
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
And one file is throwing this error:
一个文件抛出了这个错误:
file my.csv, line 1: line contains NULL byte
What can I do? Google seems to suggest that it may be an Excel file that's been saved as a .csv improperly. Is there any way I can get round this problem in Python?
我能做什么?谷歌似乎暗示它可能是一个被不正确地保存为 .csv 的 Excel 文件。有什么办法可以在 Python 中解决这个问题吗?
== UPDATE ==
== 更新 ==
Following @JohnMachin's comment below, I tried adding these lines to my script:
按照下面@JohnMachin 的评论,我尝试将这些行添加到我的脚本中:
print repr(open(filepath, 'rb').read(200)) # dump 1st 200 bytes of file
data = open(filepath, 'rb').read()
print data.find('\x00')
print data.count('\x00')
And this is the output I got:
这是我得到的输出:
'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\ .... <snip>
8
13834
So the file does indeed contain NUL bytes.
所以该文件确实包含 NUL 字节。
回答by S.Lott
Why are you doing this?
你为什么做这个?
reader = csv.reader(open(filepath, "rU"))
The docs are pretty clear that you must do this:
文档很清楚你必须这样做:
with open(filepath, "rb") as src:
reader= csv.reader( src )
The mode must be "rb" to read.
模式必须是“rb”才能读取。
http://docs.python.org/library/csv.html#csv.reader
http://docs.python.org/library/csv.html#csv.reader
If csvfile is a file object, it must be opened with the ‘b' flag on platforms where that makes a difference.
如果 csvfile 是文件对象,则必须在有区别的平台上使用 'b' 标志打开它。
回答by John Machin
As @S.Lott says, you should be opening your files in 'rb' mode, not 'rU' mode. However that may NOT be causing your current problem. As far as I know, using 'rU' mode would mess you up if there are embedded \rin the data, but not cause any other dramas. I also note that you have several files (all opened with 'rU' ??) but only one causing a problem.
正如@S.Lott 所说,您应该以“rb”模式而不是“rU”模式打开文件。但是,这可能不会导致您当前的问题。据我所知,如果\r数据中嵌入了 'rU' 模式,那么使用 'rU' 模式会让你一团糟,但不会引起任何其他戏剧性的事情。我还注意到你有几个文件(都用 'rU' ?? 打开),但只有一个导致了问题。
If the csv module says that you have a "NULL" (silly message, should be "NUL") byte in your file, then you need to check out what is in your file. I would suggest that you do this even if using 'rb' makes the problem go away.
如果 csv 模块说您的文件中有一个“NULL”(愚蠢的消息,应该是“NUL”)字节,那么您需要检查文件中的内容。我建议您这样做,即使使用 'rb' 会使问题消失。
repr()is (or wants to be) your debugging friend. It will show unambiguously what you've got, in a platform independant fashion (which is helpful to helpers who are unaware what odis or does). Do this:
repr()是(或想成为)您的调试朋友。它将以独立于平台的方式明确显示您所拥有的东西(这对不知道od是什么或做什么的帮助者很有帮助)。做这个:
print repr(open('my.csv', 'rb').read(200)) # dump 1st 200 bytes of file
and carefully copy/paste (don't retype) the result into an edit of your question (not into a comment).
并小心地将结果复制/粘贴(不要重新输入)到您的问题的编辑中(而不是评论中)。
Also note that if the file is really dodgy e.g. no \r or \n within reasonable distance from the start of the file, the line number reported by reader.line_numwill be (unhelpfully) 1. Find where the first \x00is (if any) by doing
另请注意,如果文件真的很狡猾,例如在距离文件开头的合理距离内没有 \r 或 \n,则报告的行号reader.line_num将是(无益的)1.\x00通过执行查找第一个位置(如果有)
data = open('my.csv', 'rb').read()
print data.find('\x00')
and make sure that you dump at least that many bytes with repr or od.
并确保使用 repr 或 od 转储至少那么多字节。
What does data.count('\x00')tell you? If there are many, you may want to do something like
是什么data.count('\x00')告诉你吗?如果有很多,你可能想要做类似的事情
for i, c in enumerate(data):
if c == '\x00':
print i, repr(data[i-30:i]) + ' *NUL* ' + repr(data[i+1:i+31])
so that you can see the NUL bytes in context.
这样您就可以在上下文中看到 NUL 字节。
If you can see \x00in the output (or \0in your od -coutput), then you definitely have NUL byte(s) in the file, and you will need to do something like this:
如果您可以\x00在输出(或\0您的od -c输出)中看到,那么文件中肯定有 NUL 个字节,您需要执行以下操作:
fi = open('my.csv', 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()
By the way, have you looked at the file (including the last few lines) with a text editor? Does it actually look like a reasonable CSV file like the other (no "NULL byte" exception) files?
顺便问一下,你有没有用文本编辑器查看过文件(包括最后几行)?它实际上看起来像其他(没有“NULL 字节”异常)文件一样合理的 CSV 文件吗?
回答by Xavier Combelle
appparently it's a XLS file and not a CSV file as http://www.garykessler.net/library/file_sigs.htmlconfirm
显然它是一个 XLS 文件而不是一个 CSV 文件,如http://www.garykessler.net/library/file_sigs.html确认
回答by ayaz
I bumped into this problem as well. Using the Python csvmodule, I was trying to read an XLS file created in MS Excel and running into the NULL byteerror you were getting. I looked around and found the xlrdPython module for reading and formatting data from MS Excel spreadsheet files. With the xlrdmodule, I am not only able to read the file properly, but I can also access many different parts of the file in a way I couldn't before.
我也遇到了这个问题。使用 Pythoncsv模块,我试图读取在 MS Excel 中创建的 XLS 文件NULL byte并遇到您遇到的错误。我环顾四周,找到了用于从 MS Excel 电子表格文件读取和格式化数据的xlrdPython 模块。使用该xlrd模块,我不仅能够正确读取文件,而且还可以以以前无法访问的方式访问文件的许多不同部分。
I thought it might help you.
我想它可能会帮助你。
回答by mikaiscute
I got the same error. Saved the file in UTF-8 and it worked.
我得到了同样的错误。以 UTF-8 格式保存文件并且它工作正常。
回答by Patrick Halley
Converting the encoding of the source file from UTF-16 to UTF-8 solve my problem.
将源文件的编码从 UTF-16 转换为 UTF-8 解决了我的问题。
How to convert a file to utf-8 in Python?
import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open(sourceFileName, "r", "utf-16") as sourceFile:
with codecs.open(targetFileName, "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents:
break
targetFile.write(contents)
回答by Nico The Brush
Instead of csv reader I use read file and split function for string:
我使用读取文件和字符串拆分函数代替 csv 阅读器:
lines = open(input_file,'rb')
for line_all in lines:
line=line_all.replace('\x00', '').split(";")
回答by user1990371
This happened to me when I created a CSV file with OpenOffice Calc. It didn't happen when I created the CSV file in my text editor, even if I later edited it with Calc.
当我使用 OpenOffice Calc 创建一个 CSV 文件时,这发生在我身上。当我在文本编辑器中创建 CSV 文件时并没有发生这种情况,即使我后来使用 Calc 对其进行了编辑。
I solved my problem by copy-pasting in my text editor the data from my Calc-created file to a new editor-created file.
我通过在文本编辑器中将 Calc 创建的文件中的数据复制粘贴到编辑器创建的新文件中解决了我的问题。
回答by Matthias Kuhn
I had the same problem opening a CSV produced from a webservice which inserted NULL bytes in empty headers. I did the following to clean the file:
我在打开由 web 服务生成的 CSV 时遇到了同样的问题,该服务在空标头中插入了 NULL 字节。我做了以下清理文件:
with codecs.open ('my.csv', 'rb', 'utf-8') as myfile:
data = myfile.read()
# clean file first if dirty
if data.count( '\x00' ):
print 'Cleaning...'
with codecs.open('my.csv.tmp', 'w', 'utf-8') as of:
for line in data:
of.write(line.replace('\x00', ''))
shutil.move( 'my.csv.tmp', 'my.csv' )
with codecs.open ('my.csv', 'rb', 'utf-8') as myfile:
myreader = csv.reader(myfile, delimiter=',')
# Continue with your business logic here...
Disclaimer: Be aware that this overwrites your original data. Make sure you have a backup copy of it. You have been warned!
免责声明:请注意,这会覆盖您的原始数据。确保您有它的备份副本。你被警告了!
回答by Bill Gross
For all those 'rU' filemode haters: I just tried opening a CSV file from a Windows machine on a Mac with the 'rb' filemode and I got this error from the csv module:
对于所有那些“rU”文件模式仇恨者:我只是尝试使用“rb”文件模式从 Mac 上的 Windows 机器打开 CSV 文件,但我从 csv 模块收到此错误:
Error: new-line character seen in unquoted field - do you need to
open the file in universal-newline mode?
Opening the file in 'rU' mode works fine. I love universal-newline mode -- it saves me so much hassle.
在“rU”模式下打开文件工作正常。我喜欢通用换行模式——它为我省去了很多麻烦。

