pandas 熊猫在 groupby 中设置值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35046725/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas set value in groupby
提问by Bruce Pucci
I have a DataFrame...
我有一个数据框...
>>> df = pd.DataFrame({
... 'letters' : ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
... 'is_min' : np.zeros(9),
... 'numbers' : np.random.randn(9)
... })
is_min letters numbers
0 0 a 0.322499
1 0 a -0.196617
2 0 a -1.194251
3 0 b 1.005323
4 0 b -0.186364
5 0 b -1.886273
6 0 c 0.014960
7 0 c -0.832713
8 0 c 0.689531
I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'. I have tried this and feel that I am close...
如果“数字”是“字母”列的最小值,我想将“is_min”列设置为 1。我试过这个,觉得我很接近......
>>> df.groupby('letters')['numbers'].transform('idxmin')
0 2
1 2
2 2
3 5
4 5
5 5
6 7
7 7
8 7
dtype: int64
I am having a hard time connecting the dots to set the val of 'is_min' to 1.
我很难将点连接起来以将 'is_min' 的 val 设置为 1。
回答by EdChum
Pass the row labels to loc
and set the column:
将行标签传递给loc
并设置列:
In [34]:
df.loc[df.groupby('letters')['numbers'].transform('idxmin'), 'is_min']=1
df
Out[34]:
is_min letters numbers
0 1 a -0.374751
1 0 a 1.663334
2 0 a -0.123599
3 1 b -2.156204
4 0 b 0.201493
5 0 b 1.639512
6 0 c -0.447271
7 0 c 0.017204
8 1 c -1.261621
So what's happening here is that by calling loc
we only select the rows that are returned by your transform
method and these get set to 1
as desired.
所以这里发生的事情是,通过调用loc
我们只选择您的transform
方法返回的行,这些行会1
根据需要设置。
Not sure if it matters much but you could call unique
so that you get just the row labels without repetition which may be faster:
不确定它是否很重要,但您可以调用,unique
以便您只获得行标签而不会重复,这可能会更快:
df.loc[df.groupby('letters')['numbers'].transform('idxmin').unique(), 'is_min']=1
回答by miraculixx
I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'.
如果“数字”是“字母”列的最小值,我想将“is_min”列设置为 1。
A perhaps more intuitive method is to calculate the minima per group of letters
, then use group-wise .apply
to assign is_min
:
一种可能更直观的方法是计算每组 的最小值letters
,然后使用分组.apply
来分配is_min
:
def set_is_min(m):
df.loc[df.numbers == m, 'is_min'] = 1
mins = df.groupby('letters').numbers.min().apply(set_is_min)
In large dataframes, this method is actually 20% faster than using transform:
在大型数据帧中,这种方法实际上比使用变换快 20%:
# timeit with 100'000 rows
# .apply on group minima
100 loops, best of 3: 16.7 ms per loop
# .transform
10 loops, best of 3: 21.9 ms per loop
I ran a some more benchmarksof various methods using apply and transform.
我使用 apply 和 transform 对各种方法进行了更多基准测试。