如何用 Pandas,Python 中的几个指定单词替换系列中的所有单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21919877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:43:07  来源:igfitidea点击:

How to replace all words in a series with a few specified words in Pandas,Python?

pythonstringreplacepandas

提问by Alexis

I want to essentially find and replace using python.

我想基本上使用python查找和替换。

However, I want to say if a cell contains something, then replace with what I want.

但是,我想说如果一个单元格包含某些内容,则替换为我想要的内容。

I know

我知道

str.replace('safsd','something else')

However, I am not sure how to specify how to get rid of EVERYTHING in that cell. Do I use *? I am not too familiar with that in python but I know in the bash shell *references to everything...

但是,我不确定如何指定如何摆脱该单元格中的所有内容。我用*吗?我对 python 不太熟悉,但我知道在 bash shell*中对所有内容的引用......

I have

我有

df['Description'] 

that can contain 'optiplex 9010 for classes and research'which I just want to replace with 'optiplex 9010'. Or 'macbook air 11 with configurations...etc.'and I want simply 'macbook air 11'

可以包含'optiplex 9010 for classes and research'我只想替换为'optiplex 9010'. 或者'macbook air 11 with configurations...etc.'我只想'macbook air 11'

I am aiming for...

我的目标是...

if  Df['Description'].str.contains('macbook air 11')
  then Df['Description'].str.replace(' (not sure what I put in here) , 'mabook air 11')

Any help/ideas?

任何帮助/想法?

Thanks!

谢谢!

**Additional info that may be helfpul...

**可能有用的其他信息...

I am working with thousands of different user inputs. So the 'Descriptions' of what someone has purchased is not going to be the same at all in context, wording, structure, etc. etc. I can either manually go into excel and filter by what contains 'optiplex 9010' and then replace everything with a simple description , doing the same for macbooks, etc.

我正在处理数千种不同的用户输入。因此,某人购买的“描述”在上下文、措辞、结构等方面完全不同。我可以手动进入 excel 并按包含“optiplex 9010”的内容进行过滤,然后替换所有内容用简单的描述,对 macbook 等做同样的事情。

I figured there may be some simpler way using pandas/python .str.contains and .str.replace.

我认为使用 pandas/python .str.contains 和 .str.replace 可能有一些更简单的方法。

Hope that extra info helps! Let me know

希望额外的信息有帮助!让我知道

回答by Andy Hayden

str.replace takes a regular expression, for example 'macbook air 11'followed zero (or more) (*) of any characters (.) (you could also flag to be case insensitive):

str.replace 采用正则表达式,例如 'macbook air 11'后跟零个(或多个)( *) 的任何字符 ( .) (您也可以标记为不区分大小写):

Df['Description'].str.replace('macbook air 11.*' , 'macbook air 11')

A little primer on regex can be found here.

可以在此处找到有关正则表达式的一些入门知识。

However, you might be better off, especially if you have already have a complete list of topics, to normalize the names (e.g. using fuzzywuzzy like in this question / answer):

但是,您可能会更好,特别是如果您已经有了完整的主题列表,来规范化名称(例如,在这个问题/答案中使用模糊模糊):

from fuzzywuzzy.fuzz import partial_ratio
Df['Description'].apply(lambda x: max(topics, key=lambda t: partial_ratio(x, t)))

回答by dawg

You can use a regex on a Pandas series like so.

您可以像这样在 Pandas 系列上使用正则表达式。

First create a dumb series of strings:

首先创建一系列愚蠢的字符串:

>>> import re
>>> import pandas as pd
>>> s=pd.Series(['Value {} of 3'.format(e) for e in range(1,4)])
>>> s
0     Value 1 of 3
1     Value 2 of 3
2     Value 3 of 3

Then use a regex sub to replace the string value of all digits with 5and lower case the string:

然后使用正则表达式 sub 将所有数字的字符串值替换为5小写字符串:

>>> s.apply(lambda s: re.sub(r'\d+', '5', s).lower())
0    value 5 of 5
1    value 5 of 5
2    value 5 of 5
dtype: object

Of course if you want to just replace all, you can use a regex or string replace:

当然,如果你只想全部替换,你可以使用正则表达式或字符串替换:

>>> s.apply(lambda s: re.sub(r'^.*$', 'GONE!!!', s))
0    GONE!!!
1    GONE!!!
2    GONE!!!
dtype: object
>>> s.apply(lambda s: s.replace(s, 'GONE!!!'))
0    GONE!!!
1    GONE!!!
2    GONE!!!
dtype: object

回答by nagyben

This is a perfect example of a problem that can be solved using regexes. And I also find that a situation like this is a great excuse to learn about them! Here is an incredibly detailed tutorial on how to use regexes http://www.regular-expressions.info/tutorial.html

这是一个可以使用正则表达式解决的问题的完美示例。而且我还发现这样的情况是了解它们的绝佳借口!这是关于如何使用正则表达式的非常详细的教程http://www.regular-expressions.info/tutorial.html