pandas 熊猫在 groupby 中设置值

Question

提问by Bruce Pucci

I have a DataFrame...

我有一个数据框...

>>> df = pd.DataFrame({
...            'letters' : ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], 
...            'is_min' : np.zeros(9),
...            'numbers' : np.random.randn(9)
... })

    is_min  letters numbers
0   0       a       0.322499
1   0       a      -0.196617
2   0       a      -1.194251
3   0       b       1.005323
4   0       b      -0.186364
5   0       b      -1.886273
6   0       c       0.014960
7   0       c      -0.832713
8   0       c       0.689531

I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'. I have tried this and feel that I am close...

如果“数字”是“字母”列的最小值，我想将“is_min”列设置为 1。我试过这个，觉得我很接近......

>>> df.groupby('letters')['numbers'].transform('idxmin')

0    2
1    2
2    2
3    5
4    5
5    5
6    7
7    7
8    7
dtype: int64

I am having a hard time connecting the dots to set the val of 'is_min' to 1.

我很难将点连接起来以将 'is_min' 的 val 设置为 1。

Answer 1

回答by EdChum

Pass the row labels to locand set the column:

将行标签传递给loc并设置列：

In [34]:
df.loc[df.groupby('letters')['numbers'].transform('idxmin'), 'is_min']=1
df

Out[34]:
   is_min letters   numbers
0       1       a -0.374751
1       0       a  1.663334
2       0       a -0.123599
3       1       b -2.156204
4       0       b  0.201493
5       0       b  1.639512
6       0       c -0.447271
7       0       c  0.017204
8       1       c -1.261621

So what's happening here is that by calling locwe only select the rows that are returned by your transformmethod and these get set to 1as desired.

所以这里发生的事情是，通过调用loc我们只选择您的transform方法返回的行，这些行会1根据需要设置。

Not sure if it matters much but you could call uniqueso that you get just the row labels without repetition which may be faster:

不确定它是否很重要，但您可以调用，unique以便您只获得行标签而不会重复，这可能会更快：

df.loc[df.groupby('letters')['numbers'].transform('idxmin').unique(), 'is_min']=1

Answer 2

回答by miraculixx

I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'.

如果“数字”是“字母”列的最小值，我想将“is_min”列设置为 1。

A perhaps more intuitive method is to calculate the minima per group of letters, then use group-wise .applyto assign is_min:

一种可能更直观的方法是计算每组的最小值letters，然后使用分组.apply来分配is_min：

def set_is_min(m):
   df.loc[df.numbers == m, 'is_min'] = 1
mins = df.groupby('letters').numbers.min().apply(set_is_min)

In large dataframes, this method is actually 20% faster than using transform:

在大型数据帧中，这种方法实际上比使用变换快 20%：

# timeit with 100'000 rows
# .apply on group minima
100 loops, best of 3: 16.7 ms per loop
# .transform
10 loops, best of 3: 21.9 ms per loop

I ran a some more benchmarksof various methods using apply and transform.

我使用 apply 和 transform 对各种方法进行了更多基准测试。

pandas 熊猫在 groupby 中设置值

提问by Bruce Pucci

回答by EdChum

回答by miraculixx

相关推荐

最近更新

标签

pandas 熊猫在 groupby 中设置值

提问by Bruce Pucci

回答by EdChum

回答by miraculixx

相关推荐

pandas AttributeError: 'list' 对象在尝试从 dicts 列表创建 DataFrame 时没有属性 'keys'

使用 for 循环重命名 Pandas 数据框列

pandas 数据框 values.tolist() 数据类型

无法导入名为 pandas 的模块

相关推荐

最近更新

标签