pandas 在熊猫数据框替换功能中使用正则表达式匹配组

Question

提问by Peter D

I'm just learning python/pandas and like how powerful and concise it is.

我只是在学习 python/pandas，喜欢它的强大和简洁。

During data cleaning I want to use replace on a column in a dataframe with regex but I want to reinsert parts of the match (groups).

在数据清理期间，我想使用正则表达式在数据框中的列上使用替换，但我想重新插入匹配的部分（组）。

Simple Example: lastname, firstname -> firstname lastname

简单示例：姓氏，名字 -> 名字姓氏

I tried something like the following (actual case is more complex so excuse the simple regex):

我尝试了以下内容（实际情况更复杂，所以请原谅简单的正则表达式）：

df['Col1'].replace({'([A-Za-z])+, ([A-Za-z]+)' : ' '}, inplace=True, regex=True)

However, this results in empty values. The match part works as expected, but the value part doesn't. I guess this could be achieved by some splitting and merging, but I am looking for a general answer as to whether the regex group can be used in replace.

但是，这会导致空值。匹配部分按预期工作，但值部分没有。我想这可以通过一些拆分和合并来实现，但我正在寻找关于是否可以使用正则表达式组替换的一般答案。

Answer 1

回答by MaxU

I think you have a few issues with the RegEx's.

我认为您对 RegEx 有一些问题。

As @Abdou just saiduse either '\\2 \\1'or better r'\2 \1', as '\1'is a symbol with ASCII code 1

正如@Abdou 刚才所说的那样使用'\\2 \\1'或更好r'\2 \1'，因为'\1'是带有 ASCII 代码的符号1

Your solution should work if you will use correct RegEx's:

如果您将使用正确的 RegEx，您的解决方案应该有效：

In [193]: df
Out[193]:
              name
0        John, Doe
1  Max, Mustermann

In [194]: df.name.replace({r'(\w+),\s+(\w+)' : r' '}, regex=True)
Out[194]:
0          Doe John
1    Mustermann Max
Name: name, dtype: object

In [195]: df.name.replace({r'(\w+),\s+(\w+)' : r' ', 'Max':'Fritz'}, regex=True)
Out[195]:
0            Doe John
1    Mustermann Fritz
Name: name, dtype: object

Answer 2

回答by piRSquared

setup

设置

df = pd.DataFrame(dict(name=['Smith, Sean']))
print(df)

          name
0  Smith, Sean

using replace

使用 replace

df.name.str.replace(r'(\w+),\s*(\w+)', r' ')

0    Sean Smith
Name: name, dtype: object

using extract
split to two columns

使用extract
拆分为两列

df.name.str.extract('(?P<Last>\w+),\s*(?P<First>\w+)', expand=True)

    Last First
0  Smith  Sean

pandas 在熊猫数据框替换功能中使用正则表达式匹配组

提问by Peter D

回答by MaxU

回答by piRSquared

相关推荐

最近更新

标签

pandas 在熊猫数据框替换功能中使用正则表达式匹配组

提问by Peter D

回答by MaxU

回答by piRSquared

相关推荐

pandas 在熊猫中使用 iterrows 的 for 循环

Pandas Resample 应用自定义函数？

pandas 在熊猫数据框中舍入一列

循环遍历不同的 Pandas 数据帧

相关推荐

最近更新

标签