pandas 熊猫一次替换多个值

Question

提问by Lukasz

I'm attempting to clean up some of the Data that I have from an excel file. The file contains 7400 rows and 18 columns, which includes a list of customers with their respective addresses and other data. The problem that I'm encountering is that some of the cities are misspelled which distorts the information and makes it difficult for further processing.

我正在尝试清理我从 excel 文件中获得的一些数据。该文件包含 7400 行和 18 列，其中包括带有各自地址和其他数据的客户列表。我遇到的问题是某些城市拼写错误，这会扭曲信息并使进一步处理变得困难。

  SURNAME   | ADDRESS          | CITY
0 Jenson    | 252 Des Chênes   | D.DO
1 Jean      | 236 Gouin        | DOLLARD
2 Denis     | 993 Boul. Gouin  | DOLLARD-DES-ORMEAUX
3 Bradford  | 1690 Dollard #7  | DDO
4 Alisson   | 115 Du Buisson   | IL PERROT
5 Abdul     | 9877 Boul. Gouin | Pierrefonds
6 O'Neil    | 5 Du College     | Ile Bizard
7 Bundy     | 7345 Sherbrooke  | ILLE Perot
8 Darcy     | 8671 Anthony #2  | ILE Perrot
9 Adams     | 845 Georges      | Pierrefonds

In the above example D.DO, DOLLARD, DDO should be spelled DOLLARD-DES-ORMEAUX and IL PERROT, ILLE PEROT, ILE PERROT should be spelled ILE-PERROT.

在上面的示例中，D.DO、DOLLARD、DDO 应拼写为 DOLLARD-DES-ORMEAUX，而 IL PERROT、ILLE PEROT、ILE PERROT 应拼写为 ILE-PERROT。

I've been able to replace the values using:

我已经能够使用以下方法替换值：

df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True) 
df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True)

Is there some way of combining the above operations into one? I've tried:

有没有办法将上述操作合二为一？我试过了：

df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True)

but I've had no luck

但我没有运气

Answer 1

回答by MaxU

try .replace({}, regex=True)method:

尝试.replace({}, regex=True)方法：

replacements = {
   'CITY': {
      r'(D.*DO|DOLLARD.*)': 'DOLLARD-DES-ORMEAUX',
      r'I[lL]*[eE]*.*': 'ILLE Perot'}
}

df.replace(replacements, regex=True, inplace=True)

print(df)

Output:

输出：

    SURNAME           ADDRESS                 CITY
0    Jenson    252 Des Ch├?nes  DOLLARD-DES-ORMEAUX
1      Jean         236 Gouin  DOLLARD-DES-ORMEAUX
2     Denis   993 Boul. Gouin  DOLLARD-DES-ORMEAUX
3  Bradford   1690 Dollard #7  DOLLARD-DES-ORMEAUX
4   Alisson    115 Du Buisson           ILLE Perot
5     Abdul  9877 Boul. Gouin          Pierrefonds
6    O'Neil      5 Du College           ILLE Perot
7     Bundy   7345 Sherbrooke           ILLE Perot
8     Darcy   8671 Anthony #2           ILLE Perot
9     Adams       845 Georges          Pierrefonds

Answer 2

回答by Alexander

You can create a dictionary of replacements and then iterate through them, using 'loc' for replacement.

您可以创建一个替换字典，然后遍历它们，使用 'loc' 进行替换。

target_for_values = {
    'DOLLARD-DES-ORMEAUX': ['D.DO', 'DOLLARD', 'DDO'], 
    'ILE-PERROT': ['IL PERROT', 'ILLE PEROT', 'ILE PERROT']}

for k, v in target_for_values.iteritems():
    df.loc[df.CITY.str.upper().isin(v), 'CITY'] = k

>>> df.CITY
                  CITY
0                 C.DO
1  DOLLARD-DES-ORMEAUX
2  DOLLARD-DES-ORMEAUX
3  DOLLARD-DES-ORMEAUX
4           ILE-PERROT
5          Pierrefonds
6           Ile Bizard
7           ILE-PERROT
8           ILE-PERROT
9          Pierrefonds

pandas 熊猫一次替换多个值

提问by Lukasz

回答by MaxU

回答by Alexander

相关推荐

最近更新

标签

pandas 熊猫一次替换多个值

提问by Lukasz

回答by MaxU

回答by Alexander

相关推荐

Pandas - 根据日期将数据帧拆分为多个数据帧？

pandas 情节不会在 Jupyter 中显示

pandas 用熊猫创建空的 csv 文件

将日期列设置为索引 date.time pandas python

相关推荐

最近更新

标签