Python 使用 Pandas 读取制表符分隔的文件 - 适用于 Windows,但不适用于 Mac

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27896214/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:25:39  来源:igfitidea点击:

Reading tab-delimited file with Pandas - works on Windows, but not on Mac

pythonmacospandasimporttab-delimited

提问by user3062149

I've been reading a tab-delimited data file in Windows with Pandas/Python without any problems. The data file contains notes in first three lines and then follows with a header.

我一直在使用 Pandas/Python 在 Windows 中读取制表符分隔的数据文件,没有任何问题。数据文件的前三行包含注释,然后是标题。

df = pd.read_csv(myfile,sep='\t',skiprows=(0,1,2),header=(0))

I'm now trying to read this file with my Mac. (My first time using Python on Mac.) I get the following error.

我现在正在尝试用我的 Mac 读取这个文件。(我第一次在 Mac 上使用 Python。)我收到以下错误。

pandas.parser.CParserError: Error tokenizing data. C error: Expected 1
fields in line 8, saw 39

If set the error_bad_linesargument for read_csvto False, I get the following information, which continues until the end of the last row.

如果设置error_bad_lines的说法read_csv,我得到以下信息,这一直持续到最后一行的末尾。

Skipping line 8: expected 1 fields, saw 39
Skipping line 9: expected 1 fields, saw 125
Skipping line 10: expected 1 fields, saw 125
Skipping line 11: expected 1 fields, saw 125
Skipping line 12: expected 1 fields, saw 125
Skipping line 13: expected 1 fields, saw 125
Skipping line 14: expected 1 fields, saw 125
Skipping line 15: expected 1 fields, saw 125
Skipping line 16: expected 1 fields, saw 125
Skipping line 17: expected 1 fields, saw 125
...

Do I need to specify a value for the encodingargument? It seems as though I shouldn't have to because reading the file works fine on Windows.

我需要为encoding参数指定一个值吗?似乎我不应该这样做,因为在 Windows 上读取文件可以正常工作。

采纳答案by brad sanders

The biggest clue is the rows are all being returned on one line. This indicates line terminators are being ignored or are not present.

最大的线索是所有行都在一行上返回。这表示行终止符被忽略或不存在。

You can specify the line terminator for csv_reader. If you are on a mac the lines created will end with \rrather than the linux standard \nor better still the suspenders and belt approach of windows with \r\n.

您可以为 csv_reader 指定行终止符。如果你是在Mac上创建将结束行\r,而不是Linux标准\n或者更好的是有窗户的吊带和腰带的方法\r\n

pandas.read_csv(filename, sep='\t', lineterminator='\r')

You could also open all your data using the codecs package. This may increase robustness at the expense of document loading speed.

您还可以使用 codecs 包打开所有数据。这可能会以牺牲文档加载速度为代价来提高鲁棒性。

import codecs

doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set

df = pandas.read_csv(doc, sep='\t')

回答by user3479780

Another option would be to add engine='python'to the command pandas.read_csv(filename, sep='\t', engine='python')

另一种选择是添加engine='python'到命令pandas.read_csv(filename, sep='\t', engine='python')