pandas.errors.ParserError: ',' 预期在 '"' 之后

Question

提问by math4everyone

I am trying to read this dataset from Kaggle: Amazon sales rank data for print and kindle books

The file amazon_com_extras.csvhas a column named "Title" that sometimes contains a comma ',' so all the fields in this .csv are enclosed by quotation marks:

该文件amazon_com_extras.csv有一个名为“Title”的列，有时包含逗号 ','，因此此 .csv 中的所有字段都用引号括起来：

"ASIN","GROUP","FORMAT","TITLE","AUTHOR","PUBLISHER"
"022640014X","book","hardcover","The Diversity Bargain: And Other Dilemmas of Race, Admissions, and Meritocracy at Elite Universities","Natasha K. Warikoo","University Of Chicago Press"

I have read other questions related to this problem but none of them solve it. For example, I have tried:

我已经阅读了与此问题相关的其他问题，但没有一个能解决它。例如，我试过：

df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',')
df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',',quotechar='"')

But nothing seems to work. I am using Python 3.7.2 and pandas 0.24.1.

但似乎没有任何效果。我正在使用 Python 3.7.2 和 Pandas 0.24.1。

Answer 1

回答by ssice

This is happening to you because there are fields inside the document that contain unescaped quotes inside the quoted text.

这发生在您身上，因为文档中的某些字段在引用的文本中包含未转义的引号。

I am not aware of a way to instruct the csv parser to handle that without preprocessing.

我不知道有什么方法可以指示 csv 解析器在没有预处理的情况下处理它。

If you don't care about those columns, you can use

如果你不关心那些列，你可以使用

pd.read_csv("amazon_com_extras.csv", engine="python", sep=',', quotechar='"', error_bad_lines=False)

That will disable the Exception from being raised, but it will remove the affected lines (you will see that in the console).

这将禁止引发异常，但会删除受影响的行（您将在控制台中看到）。

An example of such a line:

这样一行的一个例子：

"1405246510","book","hardcover",""Hannah Montana" Annual 2010","Unknown","Egmont Books Ltd"

Notice the quotes.

注意引号。

Instead, a more standard dialect of csv would have rendered:

相反，更标准的 csv 方言会呈现：

1405246510,"book","hardcover","""Hannah Montana"" Annual 2010","Unknown","Egmont Books Ltd"

You can, for example, load the file with Libreoffice and re-save it as CSV again to get a working CSV dialect or use other preprocessing techniques.

例如，您可以使用 Libreoffice 加载文件并再次将其重新保存为 CSV 以获取有效的 CSV 方言或使用其他预处理技术。

Answer 2

回答by alpoza

This works for me Sniffer:

这对我有用嗅探器：

import requests
import csv
with open('spotify_dataset.csv') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(14734))


df = pd.read_csv('spotify_dataset.csv', engine='python', dialect=dialect, error_bad_lines=False)

pandas.errors.ParserError: ',' 预期在 '"' 之后

提问by math4everyone

回答by ssice

回答by alpoza

相关推荐

最近更新

标签

pandas.errors.ParserError: ',' 预期在 '"' 之后

提问by math4everyone

回答by ssice

回答by alpoza

相关推荐

是什么导致 Pandas 中的“索引过去 lexsort 深度”警告？

pandas 将趋势线添加到 matplotlib 线图 python

pandas 熊猫日期时间到 unix 时间戳秒

pandas 使用 seaborn.pairplot() 以多种颜色绘制数据框？

相关推荐

最近更新

标签