pandas read_csv 中的转义引号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13824840/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Escaped quotes in pandas read_csv
提问by Oliver
I am unable to create a dataframe which has escaped quotes when using read_csv.
(Note: R's read.csvworks as expected.)
我无法创建一个在使用read_csv.
(注意:Rread.csv按预期工作。)
My code:
我的代码:
import pandas as pd
pd.read_csv('data.csv')
#error!
CParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 3
data.csv
数据.csv
SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv p? hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \"Bergslagen\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
How can I read this csv and avoid this error?
如何读取此 csv 并避免此错误?
My guess is that pandas is using some regular expressions which cannot handle the ambiguity and trips on the third row, or more specifically: \"Bergslagen\".
我的猜测是Pandas正在使用一些正则表达式,它们无法处理第三行的歧义和行程,或者更具体地说:\"Bergslagen\".
回答by Wes McKinney
It does work, but you have to indicate the escape character for the embedded quotes:
它确实有效,但您必须为嵌入的引号指明转义字符:
In [1]: data = '''SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv p\xc3\xa5 hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \"Bergslagen\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"'''
In [2]: df = read_csv(StringIO(data), escapechar='\', encoding='utf-8')
In [3]: df
Out[3]:
SEARCH_TERM ACTUAL_URL
0 bra tv bord http://www.ikea.com/se/sv/catalog/categories/d...
1 tv p? hjul http://www.ikea.com/se/sv/catalog/categories/d...
2 SLAGBORD, "Bergslagen", IKEA:s 1700-tals serie http://www.ikea.com/se/sv/catalog/categories/d...
see this gist.
看到这个要点。

