python pandas read_csv quotechar 不起作用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37074914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:11:28  来源:igfitidea点击:

python pandas read_csv quotechar does not work

pythoncsvpandas

提问by ragesz

I've read this, thisand thisposts but despite I don't know why quotechardoes not work at pd.read_csv()(Python 3, pandas 0.18.0 and 0.18.1). And how could I read a dataframe like this:

我已经阅读了这个这个这个帖子,但尽管我不知道为什么quotecharpd.read_csv()(Python 3、pandas 0.18.0 和 0.18.1)上不起作用。我怎么能读取这样的数据帧:

"column1","column2", "column3", "column4", "column5", "column6"
"AM", 7, "1", "SD", "SD", "CR"
"AM", 8, "1,2 ,3", "PR, SD,SD", "PR ; , SD,SD", "PR , ,, SD ,SD"
"AM", 1, "2", "SD", "SD", "SD"

I want the following result:

我想要以下结果:

Out[116]: 
  column1  column2 column3    column4       column5        column6
0      AM        7       1         SD            SD             CR
1      AM        8  1,2 ,3  PR, SD,SD  PR ; , SD,SD  PR , ,, SD,SD
2      AM        1       2         SD            SD             SD

Thank you!!

谢谢!!

回答by ptrj

Pandas docon separators in read_csv():

关于分隔符的Pandas 文档read_csv()

Separators longer than 1 character and different from '\s+' will be interpreted as regular expressions, will force use of the python parsing engine and will ignore quotes in the data.

超过 1 个字符且与 '\s+' 不同的分隔符将被解释为正则表达式,将强制使用 python 解析引擎并忽略数据中的引号。

Try using this instead (sepby default set to a comma):

尝试改用它(sep默认设置为逗号):

pd.read_csv(file, skipinitialspace = True, quotechar = '"')

回答by yoonghm

Another solution is to use a proper regular expression instead of the simple \s+. We need to find comma (,) which is not within quotation marks:

另一种解决方案是使用适当的正则表达式而不是简单的\s+. 我们需要找到,不在引号内的逗号 ( ) :

pd.read_csv(file, 
            sep=', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))',
            engine='python')

The expression is taken from here.

该表达式取自此处