pandas 对满足条件的 SeriesGroupBy 对象使用 Apply
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38142129/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Use Apply on a SeriesGroupBy Object where conditions are met
提问by Miquel
I have a DataFrame df1
:
我有一个数据帧df1
:
df1.head() =
id ret eff
1469 2300 -0.010879 4480.0
328 2300 -0.000692 -4074.0
1376 2300 -0.009551 4350.0
2110 2300 -0.014013 5335.0
849 2300 -0.286490 -9460.0
I would like to create a new column that contains the normalized values of the column df1['eff']
.
In other words, I would like to group df1['eff']
by df1['id']
, look for the max value (mx = df1['eff'].max()
) and the min value (mn = df2['eff'].min()
), and divide in a pairwise fashion each value of the column df1['eff']
by mn
or mx
depending if df1['eff'] > 0
or df1['eff']< 0
.
我想创建一个包含列的规范化值的新列df1['eff']
。
换句话说,我想对df1['eff']
by进行分组df1['id']
,查找最大值 ( mx = df1['eff'].max()
) 和最小值 ( mn = df2['eff'].min()
),并以成对方式df1['eff']
除以mn
或mx
取决于 ifdf1['eff'] > 0
或 的列的每个值df1['eff']< 0
。
The code that I have written is the following:
我写的代码如下:
df1['normd'] = df1.groupby('id')['eff'].apply(lambda x: x/x.max() if x > 0 else x/x.min())
However python throws the following error:
但是python抛出以下错误:
*** ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().
Since df1.groupby('id')['eff']
is a SeriesGroupBy Object
, i decided to use map()
.
But again python throws the following error:
由于df1.groupby('id')['eff']
是SeriesGroupBy Object
,我决定使用map()
. 但是python再次抛出以下错误:
*** AttributeError: Cannot access callable attribute 'map' of 'SeriesGroupBy' ob
jects, try using the 'apply' method
Many thanks in advance.
提前谢谢了。
回答by jezrael
You can use custom function f
, where is possible easy add print
. So x
is Series
and you need compare each group by numpy.where
. Output is numpy array
and you need convert it to Series
:
您可以使用自定义功能f
,在可能的地方轻松添加print
。所以x
是Series
你需要通过比较各组numpy.where
。输出是numpy array
,您需要将其转换为Series
:
def f(x):
#print (x)
#print (x/x.max())
#print (x/x.min())
return pd.Series(np.where(x>0, x/x.max(), x/x.min()), index=x.index)
df1['normd'] = df1.groupby('id')['eff'].apply(f)
print (df1)
id ret eff normd
1469 2300 -0.010879 4480.0 0.839738
328 2300 -0.000692 -4074.0 0.430655
1376 2300 -0.009551 4350.0 0.815370
2110 2300 -0.014013 5335.0 1.000000
849 2300 -0.286490 -9460.0 1.000000
What is same as:
什么是相同的:
df1['normd'] = df1.groupby('id')['eff']
.apply(lambda x: pd.Series(np.where(x>0,
x/x.max(),
x/x.min()), index=x.index))
print (df1)
id ret eff normd
1469 2300 -0.010879 4480.0 0.839738
328 2300 -0.000692 -4074.0 0.430655
1376 2300 -0.009551 4350.0 0.815370
2110 2300 -0.014013 5335.0 1.000000
849 2300 -0.286490 -9460.0 1.000000