pandas 熊猫一次替换多个值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36072626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas replace multiple values at once
提问by Lukasz
I'm attempting to clean up some of the Data that I have from an excel file. The file contains 7400 rows and 18 columns, which includes a list of customers with their respective addresses and other data. The problem that I'm encountering is that some of the cities are misspelled which distorts the information and makes it difficult for further processing.
我正在尝试清理我从 excel 文件中获得的一些数据。该文件包含 7400 行和 18 列,其中包括带有各自地址和其他数据的客户列表。我遇到的问题是某些城市拼写错误,这会扭曲信息并使进一步处理变得困难。
SURNAME | ADDRESS | CITY
0 Jenson | 252 Des Chênes | D.DO
1 Jean | 236 Gouin | DOLLARD
2 Denis | 993 Boul. Gouin | DOLLARD-DES-ORMEAUX
3 Bradford | 1690 Dollard #7 | DDO
4 Alisson | 115 Du Buisson | IL PERROT
5 Abdul | 9877 Boul. Gouin | Pierrefonds
6 O'Neil | 5 Du College | Ile Bizard
7 Bundy | 7345 Sherbrooke | ILLE Perot
8 Darcy | 8671 Anthony #2 | ILE Perrot
9 Adams | 845 Georges | Pierrefonds
In the above example D.DO, DOLLARD, DDO should be spelled DOLLARD-DES-ORMEAUX and IL PERROT, ILLE PEROT, ILE PERROT should be spelled ILE-PERROT.
在上面的示例中,D.DO、DOLLARD、DDO 应拼写为 DOLLARD-DES-ORMEAUX,而 IL PERROT、ILLE PEROT、ILE PERROT 应拼写为 ILE-PERROT。
I've been able to replace the values using:
我已经能够使用以下方法替换值:
df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True)
df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True)
Is there some way of combining the above operations into one? I've tried:
有没有办法将上述操作合二为一?我试过了:
df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True)
but I've had no luck
但我没有运气
回答by MaxU
try .replace({}, regex=True)
method:
replacements = {
'CITY': {
r'(D.*DO|DOLLARD.*)': 'DOLLARD-DES-ORMEAUX',
r'I[lL]*[eE]*.*': 'ILLE Perot'}
}
df.replace(replacements, regex=True, inplace=True)
print(df)
Output:
输出:
SURNAME ADDRESS CITY
0 Jenson 252 Des Ch├?nes DOLLARD-DES-ORMEAUX
1 Jean 236 Gouin DOLLARD-DES-ORMEAUX
2 Denis 993 Boul. Gouin DOLLARD-DES-ORMEAUX
3 Bradford 1690 Dollard #7 DOLLARD-DES-ORMEAUX
4 Alisson 115 Du Buisson ILLE Perot
5 Abdul 9877 Boul. Gouin Pierrefonds
6 O'Neil 5 Du College ILLE Perot
7 Bundy 7345 Sherbrooke ILLE Perot
8 Darcy 8671 Anthony #2 ILLE Perot
9 Adams 845 Georges Pierrefonds
回答by Alexander
You can create a dictionary of replacements and then iterate through them, using 'loc' for replacement.
您可以创建一个替换字典,然后遍历它们,使用 'loc' 进行替换。
target_for_values = {
'DOLLARD-DES-ORMEAUX': ['D.DO', 'DOLLARD', 'DDO'],
'ILE-PERROT': ['IL PERROT', 'ILLE PEROT', 'ILE PERROT']}
for k, v in target_for_values.iteritems():
df.loc[df.CITY.str.upper().isin(v), 'CITY'] = k
>>> df.CITY
CITY
0 C.DO
1 DOLLARD-DES-ORMEAUX
2 DOLLARD-DES-ORMEAUX
3 DOLLARD-DES-ORMEAUX
4 ILE-PERROT
5 Pierrefonds
6 Ile Bizard
7 ILE-PERROT
8 ILE-PERROT
9 Pierrefonds