Pandas.read_csv“意外的数据结束”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52105659/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:59:15  来源:igfitidea点击:

Pandas.read_csv "unexpected end of data" Error

pythonpandas

提问by Ryan

I'm trying to read a dataset using pd.read_csv() am getting an error. Excel can open it just fine.

我正在尝试使用 pd.read_csv() 读取数据集时出现错误。Excel可以打开它就好了。

reviews = pd.read_csv('br.csv')gives the error ParserError: Error tokenizing data. C error: EOF inside string starting at line 312074

reviews = pd.read_csv('br.csv')给出错误 ParserError: Error tokenizing data。C 错误:从 312074 行开始的字符串内的 EOF

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')returns ParserError: unexpected end of data

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')返回解析器错误:数据意外结束

What can I do to fix this?

我能做些什么来解决这个问题?

Edit: This is the dataset - https://www.kaggle.com/gnanesh/goodreads-book-reviews

编辑:这是数据集 - https://www.kaggle.com/gnanesh/goodreads-book-reviews

回答by Elise Mol

For me adding this fixed it:

对我来说,添加这个修复了它:

error_bad_lines=False

error_bad_lines=False

It just skips the last line. So instead of

它只是跳过最后一行。所以代替

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)

回答by Linh Nguyen

In my case, I don't want to skip lines, since my task is required to count the number of data records in the csv file. The solution that works for me is using the Quote_None from csv library. I try this from reading on some websites that I did not remember, but it works.

就我而言,我不想跳过行,因为我的任务需要计算 csv 文件中的数据记录数。对我有用的解决方案是使用 csv 库中的 Quote_None。我通过在一些我不记得的网站上阅读来尝试这个,但它有效。

To describe my case, previouly I have the error: EOF .... Then I tried using the parameter engine='python'. But that introduce another bug for next step of using the dataframe. Then I try quoting=csv.Quote_None, and it's ok now. I hope this helps

为了描述我的情况,以前我有错误:EOF ....然后我尝试使用参数engine='python'。但这为下一步使用数据框引入了另一个错误。然后我尝试quoting=csv.Quote_None,现在可以了。我希望这有帮助

import csv    
read_file = read_csv(full_path, delimiter='~', encoding='utf-16 BE', header=0, quoting=csv.QUOTE_NONE)