Pandas Read_CSV 报价问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37589795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:19:57  来源:igfitidea点击:

Pandas Read_CSV quotes issue

pythoncsvpandasdataframequoting

提问by A. Jameel

I have a file that looks like:

我有一个文件,看起来像:

'colA'|'colB'
'word"A'|'A'
'word'B'|'B'

I want to use pd.read_csv('input.csv',sep='|', quotechar="'") but I get the following output:

我想使用pd.read_csv('input.csv',sep='|', quotechar="'") 但我得到以下输出:

colA    colB
word"A   A
wordB'   B

The last row is not correct, it should be word'B B. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!

最后一行不正确,应该是word'B B。我该如何解决这个问题?我尝试了各种迭代,但没有一个单词可以正确读取两行。我需要一些 csv 阅读专业知识!

回答by jezrael

I think you need str.stripwith apply:

我认为你需要str.stripapply

import pandas as pd
import io

temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')

df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
     colA colB
0  word"A    A
1  word'B    B

回答by Yaron

The source of the problem is that ' is defined as quote, and as a regular char.

问题的根源在于 ' 被定义为引用,并且被定义为常规字符。

You can escape it e.g.

你可以逃避它,例如

'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'

And then use escapechar:

然后使用转义符:

>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
     colA colB
0  word"A    A
1  word'B    B

Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars

您也可以使用:quoting=csv.QUOTE_ALL - 但输出将包括引号字符

>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
     'colA' 'colB'
0  'word"A'    'A'
1  'word'B'    'B'
>>>