Pandas.read_csv“意外的数据结束”错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52105659/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas.read_csv "unexpected end of data" Error
提问by Ryan
I'm trying to read a dataset using pd.read_csv() am getting an error. Excel can open it just fine.
我正在尝试使用 pd.read_csv() 读取数据集时出现错误。Excel可以打开它就好了。
reviews = pd.read_csv('br.csv')
gives the error ParserError: Error tokenizing data. C error: EOF inside string starting at line 312074
reviews = pd.read_csv('br.csv')
给出错误 ParserError: Error tokenizing data。C 错误:从 312074 行开始的字符串内的 EOF
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')
returns ParserError: unexpected end of data
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')
返回解析器错误:数据意外结束
What can I do to fix this?
我能做些什么来解决这个问题?
Edit: This is the dataset - https://www.kaggle.com/gnanesh/goodreads-book-reviews
编辑:这是数据集 - https://www.kaggle.com/gnanesh/goodreads-book-reviews
回答by Elise Mol
For me adding this fixed it:
对我来说,添加这个修复了它:
error_bad_lines=False
error_bad_lines=False
It just skips the last line. So instead of
它只是跳过最后一行。所以代替
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)
回答by Linh Nguyen
In my case, I don't want to skip lines, since my task is required to count the number of data records in the csv file. The solution that works for me is using the Quote_None from csv library. I try this from reading on some websites that I did not remember, but it works.
就我而言,我不想跳过行,因为我的任务需要计算 csv 文件中的数据记录数。对我有用的解决方案是使用 csv 库中的 Quote_None。我通过在一些我不记得的网站上阅读来尝试这个,但它有效。
To describe my case, previouly I have the error: EOF .... Then I tried using the parameter engine='python'. But that introduce another bug for next step of using the dataframe. Then I try quoting=csv.Quote_None, and it's ok now. I hope this helps
为了描述我的情况,以前我有错误:EOF ....然后我尝试使用参数engine='python'。但这为下一步使用数据框引入了另一个错误。然后我尝试quoting=csv.Quote_None,现在可以了。我希望这有帮助
import csv
read_file = read_csv(full_path, delimiter='~', encoding='utf-16 BE', header=0, quoting=csv.QUOTE_NONE)