pandas 对满足条件的 SeriesGroupBy 对象使用 Apply

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38142129/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:30:09  来源:igfitidea点击:

Use Apply on a SeriesGroupBy Object where conditions are met

pythonpandas

提问by Miquel

I have a DataFrame df1:

我有一个数据帧df1

 df1.head() = 

           id      ret     eff
    1469  2300 -0.010879  4480.0
    328   2300 -0.000692 -4074.0
    1376  2300 -0.009551  4350.0
    2110  2300 -0.014013  5335.0
    849   2300 -0.286490 -9460.0

I would like to create a new column that contains the normalized values of the column df1['eff'].
In other words, I would like to group df1['eff']by df1['id'], look for the max value (mx = df1['eff'].max()) and the min value (mn = df2['eff'].min()), and divide in a pairwise fashion each value of the column df1['eff']by mnor mxdepending if df1['eff'] > 0or df1['eff']< 0.

我想创建一个包含列的规范化值的新列df1['eff']
换句话说,我想对df1['eff']by进行分组df1['id'],查找最大值 ( mx = df1['eff'].max()) 和最小值 ( mn = df2['eff'].min()),并以成对方式df1['eff']除以mnmx取决于 ifdf1['eff'] > 0或 的列的每个值df1['eff']< 0

The code that I have written is the following:

我写的代码如下:

df1['normd'] = df1.groupby('id')['eff'].apply(lambda x: x/x.max() if x > 0 else x/x.min())

However python throws the following error:

但是python抛出以下错误:

*** ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
 a.item(), a.any() or a.all().

Since df1.groupby('id')['eff']is a SeriesGroupBy Object, i decided to use map(). But again python throws the following error:

由于df1.groupby('id')['eff']SeriesGroupBy Object,我决定使用map(). 但是python再次抛出以下错误:

 *** AttributeError: Cannot access callable attribute 'map' of 'SeriesGroupBy' ob
 jects, try using the 'apply' method

Many thanks in advance.

提前谢谢了。

回答by jezrael

You can use custom function f, where is possible easy add print. So xis Seriesand you need compare each group by numpy.where. Output is numpy arrayand you need convert it to Series:

您可以使用自定义功能f,在可能的地方轻松添加print。所以xSeries你需要通过比较各组numpy.where。输出是numpy array,您需要将其转换为Series

def f(x):
    #print (x)
    #print (x/x.max())
    #print (x/x.min())
    return pd.Series(np.where(x>0, x/x.max(), x/x.min()), index=x.index)


df1['normd'] = df1.groupby('id')['eff'].apply(f)
print (df1)
        id       ret     eff     normd
1469  2300 -0.010879  4480.0  0.839738
328   2300 -0.000692 -4074.0  0.430655
1376  2300 -0.009551  4350.0  0.815370
2110  2300 -0.014013  5335.0  1.000000
849   2300 -0.286490 -9460.0  1.000000

What is same as:

什么是相同的:

df1['normd'] = df1.groupby('id')['eff']
                  .apply(lambda x: pd.Series(np.where(x>0, 
                                                      x/x.max(), 
                                                      x/x.min()), index=x.index))
print (df1)
        id       ret     eff     normd
1469  2300 -0.010879  4480.0  0.839738
328   2300 -0.000692 -4074.0  0.430655
1376  2300 -0.009551  4350.0  0.815370
2110  2300 -0.014013  5335.0  1.000000
849   2300 -0.286490 -9460.0  1.000000