python pandas read_csv quotechar 不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37074914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas read_csv quotechar does not work
提问by ragesz
I've read this, thisand thisposts but despite I don't know why quotechar
does not work at pd.read_csv()
(Python 3, pandas 0.18.0 and 0.18.1). And how could I read a dataframe like this:
我已经阅读了这个、这个和这个帖子,但尽管我不知道为什么quotechar
在pd.read_csv()
(Python 3、pandas 0.18.0 和 0.18.1)上不起作用。我怎么能读取这样的数据帧:
"column1","column2", "column3", "column4", "column5", "column6"
"AM", 7, "1", "SD", "SD", "CR"
"AM", 8, "1,2 ,3", "PR, SD,SD", "PR ; , SD,SD", "PR , ,, SD ,SD"
"AM", 1, "2", "SD", "SD", "SD"
I want the following result:
我想要以下结果:
Out[116]:
column1 column2 column3 column4 column5 column6
0 AM 7 1 SD SD CR
1 AM 8 1,2 ,3 PR, SD,SD PR ; , SD,SD PR , ,, SD,SD
2 AM 1 2 SD SD SD
Thank you!!
谢谢!!
回答by ptrj
Pandas docon separators in read_csv()
:
关于分隔符的Pandas 文档read_csv()
:
Separators longer than 1 character and different from '\s+' will be interpreted as regular expressions, will force use of the python parsing engine and will ignore quotes in the data.
超过 1 个字符且与 '\s+' 不同的分隔符将被解释为正则表达式,将强制使用 python 解析引擎并忽略数据中的引号。
Try using this instead (sep
by default set to a comma):
尝试改用它(sep
默认设置为逗号):
pd.read_csv(file, skipinitialspace = True, quotechar = '"')
回答by yoonghm
Another solution is to use a proper regular expression instead of the simple \s+
. We need to find comma (,
) which is not within quotation marks:
另一种解决方案是使用适当的正则表达式而不是简单的\s+
. 我们需要找到,
不在引号内的逗号 ( ) :
pd.read_csv(file,
sep=', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))',
engine='python')
The expression is taken from here.
该表达式取自此处。