pandas CParserError:标记数据时出错

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37505577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:18:46  来源:igfitidea点击:

CParserError: Error tokenizing data

pythoncsvpandasdataframedata-analysis

提问by Muhammed Eltabakh

I'm having some trouble reading a csv file

我在读取 csv 文件时遇到了一些问题

import pandas as pd

df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)

I get

我得到

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5

pandas.io.common.CParserError:标记数据时出错。C 错误:第 526 行中应有 1 个字段,看到 5 个

and when I add sep=Noneto dfI get another error

当我添加时sep=Nonedf我收到另一个错误

Error: line contains NULL byte

错误:行包含 NULL 字节

I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file

我尝试添加unicode='utf-8',我什至尝试过 CSV 阅读器,但此文件无效

the csv file is totally fine, I checked it and i see nothing wrong with it

csv 文件完全没问题,我检查了一下,发现没有任何问题

Here are the errors I get:

这是我得到的错误:

采纳答案by Burhan Khalid

In your actual code, the line is:

在您的实际代码中,该行是:

>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)

You are trying to read an Excel file, and not a plain text CSV which is why things are not working.

您正在尝试读取 Excel 文件,而不是纯文本 CSV,这就是无法正常工作的原因。

Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).

Excel 文件 (xlsx) 是一种特殊的二进制格式,不能作为简单的文本文件(如 CSV 文件)读取。

You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.

您需要将 Excel 文件转换为 CSV 文件(注意 - 如果您有多个工作表,则每个工作表都应转换为自己的 csv 文件),然后读取这些文件。

You can use read_excelor you can use a library like xlrdwhich is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Pythonfor for more information on that.

您可以使用read_excel或者您可以使用类似的库xlrd,该库旨在读取 Excel 文件的二进制格式;有关更多信息,请参阅使用 Python 读取/解析 Excel (xls) 文件

回答by jezrael

Use read_excelinstead read_csvif Excelfile:

如果文件,请read_excel改用:read_csvExcel

import pandas as pd

df = pd.read_excel("Data_Matches_tekha.xlsx")

回答by Othmane304

I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.

当我使用 to_csv 写入一些数据然后在另一个脚本中读取它时,我遇到了同样的错误。我找到了一个简单的解决方案,无需通过 pandas 的 read 函数,它是一个名为Pickle的包。

You can download it by typing in your terminal

您可以通过在终端中输入来下载它

pip install pickle 

Then you can use for writing your data (first) the code below

然后你可以使用下面的代码来编写你的数据(首先)

import pickle 

with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)

And finally import your data in another script using

最后使用另一个脚本导入您的数据

import pickle 

with open(path, 'rb') as input:
data = pickle.load(input)

Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=xwith x corresponding to the version (2 or 3) aiming to use for reading.

请注意,如果您想在读取保存的数据时使用与保存数据的版本不同的 Python 版本,则可以在写入步骤中使用protocol=x与版本(2 或 3)相对应的 x来精确用于阅读。

I hope this can be of any use.

我希望这可以有任何用处。