用一个值替换 Pandas 系列中的多个子字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49413005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:21:14  来源:igfitidea点击:

Replace multiple substrings in a Pandas series with a value

pythonstringpandaspython-2.7series

提问by SBad

All,

全部,

To replace one string in one particular column I have done this and it worked fine:

要替换某一特定列中的一个字符串,我已完成此操作,并且效果很好:

dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")

I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"

我现在想用一个字符串替换多个字符串,比如用“CORP”替换 ["LOCAL", "FOREIGN", "HELLO"]

How can make it work? the code below didnt work

怎样才能让它发挥作用?下面的代码不起作用

dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")

回答by jpp

You can perform this task by forming a |-separated string. This works because pd.Series.str.replaceaccepts regex:

您可以通过形成一个 | 分隔的字符串来执行此任务。这是有效的,因为pd.Series.str.replace接受正则表达式:

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

用其他字符串替换系列/索引中出现的模式/正则表达式。等效于 str.replace() 或 re.sub()。

This avoids the need to create a dictionary.

这避免了创建字典的需要。

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])

df['A'] = df['A'].str.replace(pattern, 'CORP')

#               A
# 0     CORP TEST
# 1     TEST CORP
# 2  ANOTHER CORP
# 3       NOTHING

回答by YOBEN_S

replacecan accept dict, os we just create a dict for those values need to be replaced

replace可以接受dict,我们只是为那些需要替换的值创建一个字典

dataUS['sec_type'].str.strip().replace(dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3)),regex=True)

Info of the dict

字典的信息

dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3))
Out[585]: {'FOREIGN': 'CORP', 'HELLO': 'CORP', 'LOCAL': 'CORP'}

The reason why you receive the error ,

您收到错误的原因,

str.replaceis different from replace

str.replacereplace不同

回答by Rakesh

Try:

尝试:

dataUS.replace({"sec_type": { 'LOCAL' : "CORP", 'FOREIGN' : "CORP"}})

回答by Anthony R

Function to replace multiple values in pandas Series:

替换Pandas系列中多个值的函数:

def replace_values(series, to_replace, value): for i in to_replace: series = series.str.replace(i, value) return series

def replace_values(series, to_replace, value): for i in to_replace: series = series.str.replace(i, value) return series

Hope this helps someone

希望这有助于某人

回答by Laurens Koppenol

The answer of @Rakesh is very neat but does not allow for substrings. With a small change however, it does.

@Rakesh 的答案非常简洁,但不允许使用子字符串。然而,只要稍作改动,就可以了。

  1. Use a replacement dictionary because it makes it much more generic
  2. Add the keyword argument regex=Trueto Series.replace()(not Series.str.replace) This does two things actually: It changes your replacement to regex replacement, which is much more powerful but you will have to escape special characters. Beware for that. Secondly it will make the replace work on substrings instead of the entire string. Which is really cool!
  1. 使用替换字典,因为它使它更通用
  2. 将关键字参数添加regex=TrueSeries.replace()(not Series.str.replace) 这实际上做了两件事:它将您的替换更改为正则表达式替换,这更强大,但您必须转义特殊字符。小心这一点。其次,它将使替换工作在子字符串而不是整个字符串上。这真的很酷!
replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Full code example

完整代码示例

dataUS = pd.DataFrame({'sec_type': ['LOCAL', 'Sample text LOCAL', 'Sample text LOCAL sample FOREIGN']})

replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Output

输出

0                            CORP
1                            CORP
2                Sample text CORP
3    Sample text CORP sample CORP
Name: sec_type, dtype: object