Python 大熊猫在列中使用额外的逗号读取 csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32743479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas read csv with extra commas in column
提问by David
I'm reading a basic csv file where the columns are separated by commas with these column names:
我正在阅读一个基本的 csv 文件,其中的列用逗号分隔,这些列名:
userid, username, body
userid, username, body
However, the body column is a string which may contain commas. Obviously this causes a problem and pandas throws out an error:
但是,正文列是一个可能包含逗号的字符串。显然这会导致一个问题,pandas 会抛出一个错误:
CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8
CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8
Is there a way to tell pandas to ignore commas in a specific column or a way to go around this problem?
有没有办法告诉熊猫忽略特定列中的逗号或解决这个问题的方法?
采纳答案by Fabio Lamanna
Imagine we're reading your dataframe called comma.csv
:
想象一下,我们正在读取您的名为 的数据框comma.csv
:
userid, username, body
01, n1, 'string1, string2'
One thing you can do is to specify the delimiter of the strings in the column with:
您可以做的一件事是指定列中字符串的分隔符:
df = pd.read_csv('comma.csv', quotechar="'")
In this case strings delimited by '
are considered as total, no matter commas inside them.
在这种情况下,以 分隔的字符串'
被视为总数,无论其中是否有逗号。
回答by Ilyas
Add usecols and lineterminator to your read_csv() function, which, n is the len of your columns.
将 usecols 和 lineterminator 添加到您的 read_csv() 函数中,其中 n 是您的列的长度。
In my case:
就我而言:
n = 5 #define yours
df = pd.read_csv(file,
usecols=range(n),
lineterminator='\n',
header=None)