pandas CParserError：标记数据时出错

Question

提问by Muhammed Eltabakh

I'm having some trouble reading a csv file

我在读取 csv 文件时遇到了一些问题

import pandas as pd

df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)

I get

我得到

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5

pandas.io.common.CParserError：标记数据时出错。C 错误：第 526 行中应有 1 个字段，看到 5 个

and when I add sep=Noneto dfI get another error

当我添加时sep=None，df我收到另一个错误

Error: line contains NULL byte

错误：行包含 NULL 字节

I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file

我尝试添加unicode='utf-8'，我什至尝试过 CSV 阅读器，但此文件无效

the csv file is totally fine, I checked it and i see nothing wrong with it

csv 文件完全没问题，我检查了一下，发现没有任何问题

Here are the errors I get:

这是我得到的错误：

Answer 1

采纳答案by Burhan Khalid

In your actual code, the line is:

在您的实际代码中，该行是：

>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)

You are trying to read an Excel file, and not a plain text CSV which is why things are not working.

您正在尝试读取 Excel 文件，而不是纯文本 CSV，这就是无法正常工作的原因。

Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).

Excel 文件 (xlsx) 是一种特殊的二进制格式，不能作为简单的文本文件（如 CSV 文件）读取。

You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.

您需要将 Excel 文件转换为 CSV 文件（注意 - 如果您有多个工作表，则每个工作表都应转换为自己的 csv 文件），然后读取这些文件。

You can use read_excelor you can use a library like xlrdwhich is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Pythonfor for more information on that.

您可以使用read_excel或者您可以使用类似的库xlrd，该库旨在读取 Excel 文件的二进制格式；有关更多信息，请参阅使用 Python 读取/解析 Excel (xls) 文件。

Answer 2

回答by jezrael

Use read_excelinstead read_csvif Excelfile:

如果文件，请read_excel改用：read_csvExcel

import pandas as pd

df = pd.read_excel("Data_Matches_tekha.xlsx")

Answer 3

回答by Othmane304

I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.

当我使用 to_csv 写入一些数据然后在另一个脚本中读取它时，我遇到了同样的错误。我找到了一个简单的解决方案，无需通过 pandas 的 read 函数，它是一个名为Pickle的包。

You can download it by typing in your terminal

您可以通过在终端中输入来下载它

pip install pickle

Then you can use for writing your data (first) the code below

然后你可以使用下面的代码来编写你的数据（首先）

import pickle 

with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)

And finally import your data in another script using

最后使用另一个脚本导入您的数据

import pickle 

with open(path, 'rb') as input:
data = pickle.load(input)

Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=xwith x corresponding to the version (2 or 3) aiming to use for reading.

请注意，如果您想在读取保存的数据时使用与保存数据的版本不同的 Python 版本，则可以在写入步骤中使用protocol=x与版本（2 或 3）相对应的 x来精确用于阅读。

I hope this can be of any use.

我希望这可以有任何用处。

pandas CParserError：标记数据时出错

提问by Muhammed Eltabakh

采纳答案by Burhan Khalid

回答by jezrael

回答by Othmane304

相关推荐

最近更新

标签

pandas CParserError：标记数据时出错

提问by Muhammed Eltabakh

采纳答案by Burhan Khalid

回答by jezrael

回答by Othmane304

相关推荐

pandas 如何解析 DataFrame 列中的所有值？

pandas 使用熊猫时间序列进行线性回归

如何将 Pandas 数据框列从 np.datetime64 转换为 datetime？

pandas：用不带引号的文字制表符编写制表符分隔的数据框

相关推荐

最近更新

标签