pandas 从整个数据框中删除一个字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42135409/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:56:06  来源:igfitidea点击:

Removing a character from entire data frame

pythonstringpandasreplace

提问by MJB

A common operation that I need to do with pandas is to read the table from an Excel file and then remove semicolons from all the fields. The columns are often in mixed data types and I run into AtributeError when trying to do something like this:

我需要对 Pandas 执行的一个常见操作是从 Excel 文件中读取表格,然后从所有字段中删除分号。这些列通常是混合数据类型,我在尝试执行以下操作时遇到了 AtributeError:

for col in cols_to_check:
    df[col] = df[col].map(lambda x: x.replace(';',''))

AttributeError: 'float' object has no attribute 'replace'

AttributeError: 'float' 对象没有属性 'replace'

when I wrap it in str()before replacing I have problems with Unicode characters, e.g.

当我str()在替换之前将它包装起来时,我遇到了 Unicode 字符的问题,例如

for col in cols_to_check:
    df[col] = df[col].map(lambda x: str(x).replace(';',''))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

UnicodeEncodeError: 'ascii' 编解码器无法对位置 3 中的字符 u'\xe9' 进行编码:序号不在范围内 (128)

In excel this is a very simple operation, all it takes is to replace ;with an empty string. How can I do it similarly in pandas for entire dataframe, disregard of data types? Or am I missing something?

在excel中这是一个非常简单的操作,只需;要用一个空字符串替换即可。我怎样才能在 Pandas 中对整个数据帧进行类似的操作,而不管数据类型?或者我错过了什么?

回答by jezrael

You can use DataFrame.replaceand for select use subset:

您可以使用DataFrame.replace和 选择使用subset

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':['f;','d:','sda;sd'],
                   'D':['s','d;','d;p'],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B       C    D  E  F
0  1  4      f;    s  5  7
1  2  5      d:   d;  3  4
2  3  6  sda;sd  d;p  6  3

cols_to_check = ['C','D', 'E']

print (df[cols_to_check])
        C    D  E
0      f;    s  5
1      d:   d;  3
2  sda;sd  d;p  6

df[cols_to_check] = df[cols_to_check].replace({';':''}, regex=True)
print (df)
   A  B      C   D  E  F
0  1  4      f   s  5  7
1  2  5     d:   d  3  4
2  3  6  sdasd  dp  6  3