pandas CParserError:标记数据时出错
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37505577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CParserError: Error tokenizing data
提问by Muhammed Eltabakh
I'm having some trouble reading a csv file
我在读取 csv 文件时遇到了一些问题
import pandas as pd
df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)
I get
我得到
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5
pandas.io.common.CParserError:标记数据时出错。C 错误:第 526 行中应有 1 个字段,看到 5 个
and when I add sep=None
to df
I get another error
当我添加时sep=None
,df
我收到另一个错误
Error: line contains NULL byte
错误:行包含 NULL 字节
I tried adding unicode='utf-8'
, I even tried CSV reader and nothing works with this file
我尝试添加unicode='utf-8'
,我什至尝试过 CSV 阅读器,但此文件无效
the csv file is totally fine, I checked it and i see nothing wrong with it
csv 文件完全没问题,我检查了一下,发现没有任何问题
Here are the errors I get:
这是我得到的错误:
采纳答案by Burhan Khalid
In your actual code, the line is:
在您的实际代码中,该行是:
>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)
You are trying to read an Excel file, and not a plain text CSV which is why things are not working.
您正在尝试读取 Excel 文件,而不是纯文本 CSV,这就是无法正常工作的原因。
Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).
Excel 文件 (xlsx) 是一种特殊的二进制格式,不能作为简单的文本文件(如 CSV 文件)读取。
You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.
您需要将 Excel 文件转换为 CSV 文件(注意 - 如果您有多个工作表,则每个工作表都应转换为自己的 csv 文件),然后读取这些文件。
You can use read_excel
or you can use a library like xlrd
which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Pythonfor for more information on that.
您可以使用read_excel
或者您可以使用类似的库xlrd
,该库旨在读取 Excel 文件的二进制格式;有关更多信息,请参阅使用 Python 读取/解析 Excel (xls) 文件。
回答by jezrael
Use read_excel
instead read_csv
if Excel
file:
如果文件,请read_excel
改用:read_csv
Excel
import pandas as pd
df = pd.read_excel("Data_Matches_tekha.xlsx")
回答by Othmane304
I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.
当我使用 to_csv 写入一些数据然后在另一个脚本中读取它时,我遇到了同样的错误。我找到了一个简单的解决方案,无需通过 pandas 的 read 函数,它是一个名为Pickle的包。
You can download it by typing in your terminal
您可以通过在终端中输入来下载它
pip install pickle
Then you can use for writing your data (first) the code below
然后你可以使用下面的代码来编写你的数据(首先)
import pickle
with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)
And finally import your data in another script using
最后使用另一个脚本导入您的数据
import pickle
with open(path, 'rb') as input:
data = pickle.load(input)
Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=x
with x corresponding to the version (2 or 3) aiming to use for reading.
请注意,如果您想在读取保存的数据时使用与保存数据的版本不同的 Python 版本,则可以在写入步骤中使用protocol=x
与版本(2 或 3)相对应的 x来精确用于阅读。
I hope this can be of any use.
我希望这可以有任何用处。