pandas.errors.ParserError: ',' 预期在 '"' 之后

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/55010807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:20:21  来源:igfitidea点击:

pandas.errors.ParserError: ',' expected after '"'

python-3.xpandas

提问by math4everyone

I am trying to read this dataset from Kaggle: Amazon sales rank data for print and kindle books

我正在尝试从 Kaggle 读取此数据集:亚马逊印刷和点燃书籍的销售排名数据

The file amazon_com_extras.csvhas a column named "Title" that sometimes contains a comma ',' so all the fields in this .csv are enclosed by quotation marks:

该文件amazon_com_extras.csv有一个名为“Title”的列,有时包含逗号 ',',因此此 .csv 中的所有字段都用引号括起来:

"ASIN","GROUP","FORMAT","TITLE","AUTHOR","PUBLISHER"
"022640014X","book","hardcover","The Diversity Bargain: And Other Dilemmas of Race, Admissions, and Meritocracy at Elite Universities","Natasha K. Warikoo","University Of Chicago Press"

I have read other questions related to this problem but none of them solve it. For example, I have tried:

我已经阅读了与此问题相关的其他问题,但没有一个能解决它。例如,我试过:

df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',')
df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',',quotechar='"')

But nothing seems to work. I am using Python 3.7.2 and pandas 0.24.1.

但似乎没有任何效果。我正在使用 Python 3.7.2 和 Pandas 0.24.1。

回答by ssice

This is happening to you because there are fields inside the document that contain unescaped quotes inside the quoted text.

这发生在您身上,因为文档中的某些字段在引用的文本中包含未转义的引号。

I am not aware of a way to instruct the csv parser to handle that without preprocessing.

我不知道有什么方法可以指示 csv 解析器在没有预处理的情况下处理它。

If you don't care about those columns, you can use

如果你不关心那些列,你可以使用

pd.read_csv("amazon_com_extras.csv", engine="python", sep=',', quotechar='"', error_bad_lines=False)

That will disable the Exception from being raised, but it will remove the affected lines (you will see that in the console).

这将禁止引发异常,但会删除受影响的行(您将在控制台中看到)。

An example of such a line:

这样一行的一个例子:

"1405246510","book","hardcover",""Hannah Montana" Annual 2010","Unknown","Egmont Books Ltd"

Notice the quotes.

注意引号。

Instead, a more standard dialect of csv would have rendered:

相反,更标准的 csv 方言会呈现:

1405246510,"book","hardcover","""Hannah Montana"" Annual 2010","Unknown","Egmont Books Ltd"

You can, for example, load the file with Libreoffice and re-save it as CSV again to get a working CSV dialect or use other preprocessing techniques.

例如,您可以使用 Libreoffice 加载文件并再次将其重新保存为 CSV 以获取有效的 CSV 方言或使用其他预处理技术。

回答by alpoza

This works for me Sniffer:

这对我有用 嗅探器

import requests
import csv
with open('spotify_dataset.csv') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(14734))


df = pd.read_csv('spotify_dataset.csv', engine='python', dialect=dialect, error_bad_lines=False)