Python 大熊猫在列中使用额外的逗号读取 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32743479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:10:09  来源:igfitidea点击:

pandas read csv with extra commas in column

pythoncsvpandas

提问by David

I'm reading a basic csv file where the columns are separated by commas with these column names:

我正在阅读一个基本的 csv 文件,其中的列用逗号分隔,这些列名:

userid, username, body

userid, username, body

However, the body column is a string which may contain commas. Obviously this causes a problem and pandas throws out an error:

但是,正文列是一个可能包含逗号的字符串。显然这会导致一个问题,pandas 会抛出一个错误:

CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8

CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8

Is there a way to tell pandas to ignore commas in a specific column or a way to go around this problem?

有没有办法告诉熊猫忽略特定列中的逗号或解决这个问题的方法?

采纳答案by Fabio Lamanna

Imagine we're reading your dataframe called comma.csv:

想象一下,我们正在读取您的名为 的数据框comma.csv

userid, username, body
01, n1, 'string1, string2'

One thing you can do is to specify the delimiter of the strings in the column with:

您可以做的一件事是指定列中字符串的分隔符:

df = pd.read_csv('comma.csv', quotechar="'")

In this case strings delimited by 'are considered as total, no matter commas inside them.

在这种情况下,以 分隔的字符串'被视为总数,无论其中是否有逗号。

回答by Ilyas

Add usecols and lineterminator to your read_csv() function, which, n is the len of your columns.

将 usecols 和 lineterminator 添加到您的 read_csv() 函数中,其中 n 是您的列的长度。

In my case:

就我而言:

n = 5 #define yours
df = pd.read_csv(file,
                 usecols=range(n),
                 lineterminator='\n',
                 header=None)