Pandas:根据其他列值有条件地替换值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52224142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:00:48  来源:igfitidea点击:

Pandas: Conditionally replace values based on other columns values

pythonpython-3.xpandasdataframe

提问by Martin Müsli

I have a dataframe (df) that looks like this:

我有一个如下所示的数据框 (df):

                    environment     event   
time                    
2017-04-28 13:08:22     NaN         add_rd  
2017-04-28 08:58:40     NaN         add_rd  
2017-05-03 07:59:35     test        add_env
2017-05-03 08:05:14     prod        add_env
...

Now my goal is for each add_rdin the eventcolumn, the associated NaN-value in the environmentcolumn should be replaced with a string RD.

现在我的目标是对于列中的每个add_rdevent列中关联的NaN-valueenvironment应该替换为 string RD

                    environment     event   
time                    
2017-04-28 13:08:22     RD          add_rd  
2017-04-28 08:58:40     RD          add_rd  
2017-05-03 07:59:35     test        add_env
2017-05-03 08:05:14     prod        add_env
...


What I did so far

到目前为止我做了什么

I stumbled across df['environment'] = df['environment].fillna('RD')which replaces everyNaN(which is not what I am looking for), pd.isnull(df['environment'])which is detecting missing values and np.where(df['environment'], x,y)which seems to be what I want but isn't working. Furthermore did I try this:

我偶然发现df['environment'] = df['environment].fillna('RD')哪个替换了每个NaN(这不是我要找的),pd.isnull(df['environment'])哪个正在检测缺失值,np.where(df['environment'], x,y)哪个似乎是我想要的但不起作用。此外,我是否尝试过

import pandas as pd

for env in df['environment']:
    if pd.isnull(env) and df['event'] == 'add_rd':
        env = 'RD'

The indexes are missing or some kind of iterator to access the equivalent value in the eventcolumn.
And I tried this:

缺少索引或某种迭代器来访问event列中的等效值。
我试过这个

df['environment'] = np.where(pd.isnull(df['environment']), df['environment'] = 'RD', df['environment'])

SyntaxError: keyword can't be an expression

which obviously didn't worked.

这显然没有用。

I took a look at several questions but couldn't build on the suggestions in the answers. Black's questionSimon's questionszli's questionJan Willems Tulp's question

我查看了几个问题,但无法建立在答案中的建议之上。Black 的问题Simon 的问题szli 的问题Jan Willems Tulp 的问题

So, how do I replace a value in a column based on another columns values?

那么,如何根据另一列值替换列中的值?

采纳答案by jpp

Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD.

现在我的目标是对于事件列中的每个 add_rd,应将环境列中的关联 NaN 值替换为字符串 RD。

As per @Zero's comment, use pd.DataFrame.locand Boolean indexing:

根据@Zero 的评论,使用pd.DataFrame.loc布尔索引:

df.loc[df['event'].eq('add_rd') & df['environment'].isnull(), 'environment'] = 'RD'

回答by CT Zhu

You could consider using where:

您可以考虑使用where

df.environment.where((~df.environment.isnull()) & (df.event != 'add_rd'),
                     'RD', inplace=True)

If the condition is not met, the values is replaced by the second element.

如果不满足条件,则将值替换为第二个元素。

回答by Herc01

Here it is:

这里是:

 df['environment']=df['environment'].fillna('RD')

回答by Naga kiran

if you want to replace just 'add_rd' with 'RD', this can be useful to you

如果您只想将 'add_rd' 替换为 'RD',这对您很有用

keys_to_replace = {'add_rd':'RD','add_env':'simple'}
df['environment'] = df.groupby(['event'])['environment'].fillna(keys_to_replace['add_rd'])
df

output:

输出:

    environment event
0   RD          add_rd
1   RD          add_rd
2   test        add_env
3   prod        add_env

if you have many values to replace based on event, then you may need to follow groupby with 'event' column values

如果您有许多要根据事件替换的值,那么您可能需要使用“事件”列值跟随 groupby

keys_to_replace = {'add_rd':'RD','add_env':'simple'}
temp = df.groupby(['event']).apply(lambda x:  x['environment'].fillna(keys_to_replace[x['event'].values[0]]))
temp.index = temp.index.droplevel(0)
df['environment'] = temp.sort_index().values

output:

输出:

   environment  event
0   RD          add_rd
1   RD          add_rd
2   test        add_env
3   prod        add_env