pandas 每组中的熊猫计算

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23870745/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:05:29  来源:igfitidea点击:

pandas computation in each group

pythonpandas

提问by Moritz

I do have a grouped data frame. Here is one group as an example:

我确实有一个分组的数据框。以一组为例:

name    pH   salt  id   
sample  7.5  50    1        0.48705
                   2        0.42875
                   3        0.38885
                   4        0.34615
                   5        0.35060
                   6        0.29280
                   7        0.28210
                   8        0.24535
                   stock    0.66090

for every group, there is a stock solution which defines my initial mass. I would like to iterate over all groups and subtract the initial mass from each item. I would like to do that without explicitly writing something like df_grouped['sample'][7.5][50]. If possible, I would like to avoid any nested loops.

对于每个组,都有一个库存解决方案来定义我的初始质量。我想遍历所有组并从每个项目中减去初始质量。我想这样做而无需明确编写类似df_grouped['sample'][7.5][50]. 如果可能,我想避免任何嵌套循环。

Any suggestions?

有什么建议?

I can only think of a solution like:

我只能想到一个解决方案:

for na, gr in df_label_gr:
    if 'stock' in na:
        print(na)

This gives me:

这给了我:

('sample', 7.5, 50.0, 'stock')
('sample', 7.5, 150.0, 'stock')
('sample', 8.5, 50.0, 'stock')
('sample', 8.5, 150.0, 'stock')

So I could somehow use the first three entries to index my groups and do some calculations.

所以我可以以某种方式使用前三个条目来索引我的组并进行一些计算。

EDIT:

编辑:

in order to not mess the discussion up, i ask the same question with a small modification again here:

为了不搞乱讨论,我在这里再次提出相同的问题,并稍作修改:

The difference is that here I would like to subtract not the same value from each group but do it group specific,

不同之处在于,这里我想从每个组中减去不同的值,而是针对特定组进行,

name    pH   salt  id   
sample  7.5  50    1        0.48705
                   2        0.42875
                   3        0.38885
                   4        0.34615
                   5        0.35060
                   6        0.29280
                   7        0.28210
                   8        0.24535
                   stock    0.66090
sample  8.5  50    1        0.48705
                   2        0.42875
                   3        0.38885
                   4        0.34615
                   5        0.35060
                   6        0.29280
                   7        0.28210
                   8        0.24535
                   stock    0.1

I tried the following:

我尝试了以下方法:

df = a2_01.df.reset_index()
df.groupby(by = ['name','pH','salt','id']).aggregate(np.sum).apply(lambda x: x - x[x.index.get_level_values('id') == 'stock'].values[0])

the problem is, that x[x.index.get_level_values('id') == 'stock'].valuesgives me an array of all values and not the value of the actual group. So i could substract e.g the sample with id == stock from the first group ( values[0]), from all values in the dataframe.

问题是,这x[x.index.get_level_values('id') == 'stock'].values给了我一个包含所有值的数组,而不是实际组的值。因此,我可以values[0]从数据框中的所有值中减去例如第一组 ( ) 中id == stock 的样本。

How could I subtract the value of the stocks only from the samples in the same group?

我怎么能只从同一组的样本中减去股票的价值?

采纳答案by Happy001

I think @filmor answered your question. Probably you misunderstood it.

我认为@filmor 回答了您的问题。可能你理解错了。

I made up a dataframe by repeating the data you gave and modified indices.

我通过重复您提供的数据和修改索引来组成一个数据框。

In [117]: df
Out[117]: 
                          mass
name   pH  salt id            
sample 7.5 50   1      0.48705
                2      0.42875
                3      0.38885
                4      0.34615
                5      0.35060
                6      0.29280
                7      0.28210
                8      0.24535
                stock  0.66090
           150  1      0.48705
                2      0.42875
                3      0.38885
                4      0.34615
                5      0.35060
                6      0.29280
                7      0.28210
                8      0.24535
                stock  0.66090
       8.5 50   1      0.48705
                2      0.42875
                3      0.38885
                4      0.34615
                5      0.35060
                6      0.29280
                7      0.28210
                8      0.24535
                stock  0.66090
           150  1      0.48705
                2      0.42875
                3      0.38885
                4      0.34615
                5      0.35060
                6      0.29280
                7      0.28210
                8      0.24535
                stock  0.66090

[36 rows x 1 columns]

If you are sure stockis always last (after sorting if necessary) in each group, you can do the following. Otherwise, df.groupby(level= [0,1,2]).apply(lambda g: g - g[g.index.get_level_values('id')=='stock'].values[0])should work.

如果您确定stock每个组中始终排在最后(必要时排序后),您可以执行以下操作。否则,df.groupby(level= [0,1,2]).apply(lambda g: g - g[g.index.get_level_values('id')=='stock'].values[0])应该工作。

In [118]: df.groupby(level= [0,1,2]).apply(lambda g: g - g.iloc[-1,0])
Out[118]: 
                          mass
name   pH  salt id            
sample 7.5 50   1     -0.17385
                2     -0.23215
                3     -0.27205
                4     -0.31475
                5     -0.31030
                6     -0.36810
                7     -0.37880
                8     -0.41555
                stock  0.00000
           150  1     -0.17385
                2     -0.23215
                3     -0.27205
                4     -0.31475
                5     -0.31030
                6     -0.36810
                7     -0.37880
                8     -0.41555
                stock  0.00000
       8.5 50   1     -0.17385
                2     -0.23215
                3     -0.27205
                4     -0.31475
                5     -0.31030
                6     -0.36810
                7     -0.37880
                8     -0.41555
                stock  0.00000
           150  1     -0.17385
                2     -0.23215
                3     -0.27205
                4     -0.31475
                5     -0.31030
                6     -0.36810
                7     -0.37880
                8     -0.41555
                stock  0.00000

[36 rows x 1 columns]

回答by filmor

You can use groupbyfor this, in particular df_grouped.groupby(level=[0, 1, 2]).apply(fancy_func)in your case, where fancy_functakes a sub-dataframe and returns a value.

您可以groupby为此使用,特别是df_grouped.groupby(level=[0, 1, 2]).apply(fancy_func)在您的情况下, wherefancy_func需要一个子数据框并返回一个值。

The result will then be a series of values, indexed by the same levels.

结果将是一系列按相同级别索引的值。