pandas 多级熊猫分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45052636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:58:56  来源:igfitidea点击:

Multi level pandas groupby

pythonpandaspandas-groupby

提问by pythonRcpp

I need to maintain position of "each scrip per team per account". So I think pandas groupbymight be used. I have tried to state my problem in comment of dataframe aswell. The position column has been added manually by me.

我需要保持“每个帐户每个团队每个脚本”的位置。所以我认为groupby可能会使用Pandas。我也试图在数据框的评论中说明我的问题。位置栏是我手动添加的。

Initial df I have

我有初始 df

             time account scrip  buy_price  sell_price  qty   team  
0  06/07/17 09:36      A1  FUT1       50.0         NaN    2  team1         
1  06/07/17 09:46      A2  FUT1      100.0         NaN    2  team1         
2  06/07/17 09:56      A3  FUT1       10.0         NaN    2  team2         
3  06/07/17 09:57      A3  FUT1        NaN        10.0    2  team2         
4  06/07/17 09:58      A1  FUT1        NaN        50.0    1  team1         
5  06/07/17 09:59      A3  FUT1        NaN        50.0    1  team2 

I need to add position. The position column has been added manually by me and in the comment I write additional remark for clarity on how position is calculated.

我需要添加位置。位置列是我手动添加的,在评论中我写了额外的注释,以清楚地说明位置是如何计算的。

             time account scrip  buy_price  sell_price  qty   team  position                                    comment
0  06/07/17 09:36      A1  FUT1       50.0         NaN    2  team1         2  this can only be sold by team1 account A1
1  06/07/17 09:46      A2  FUT1      100.0         NaN    2  team1         2  this can only be sold by team1 account A2
2  06/07/17 09:56      A3  FUT1       10.0         NaN    2  team2         2  this can only be sold by team2 account A3
3  06/07/17 09:57      A3  FUT1        NaN        10.0    2  team2         0                 sold by team 2 in  acc A3 
4  06/07/17 09:58      A1  FUT1        NaN        50.0    1  team1         1                  sold by team 1 in acc A1 
5  06/07/17 09:59      A3  FUT1        NaN        50.0    1  team2        -1                   sold by team 2 in acc A3

The above example is for just 1 scrip FUT1, there would be many scrips. My end result would be something similar to.

上面的例子只针对 1 个脚本 FUT1,会有很多个脚本。我的最终结果将类似于。

Team Account Scrip Position
team1 A1      FUT1 1 
      A2      FUT1 2
team2 A3      FUT1 -1

End result can be worked on later, once position is calculated successfully. My approach: making a unique key in each row so as to know when to + or - position. eg. A1_FUT1_team1for row1A2_FUT1_team1for row2. Then add subtract matching keys. Is this anyway a good approach?

一旦位置计算成功,最终结果可以稍后处理。我的方法:在每一行中制作一个唯一的键,以便知道何时 + 或 - 位置。例如。A1_FUT1_team1row1A2_FUT1_team1row2。然后添加减匹配键。无论如何,这是一个好方法吗?

回答by Rayhane Mama

Your problem can be solved easily in two steps:

您的问题可以通过两个步骤轻松解决:

First Step:

第一步:

import math
df['some_stuff'] = df.apply(lambda x: -x.qty if math.isnan(x.buy_price) else x.qty,axis=1)

this line is creating a new column some_stuff, why I did this is just to introduce some logic of gain and lossin your data.

这一行正在创建一个新列some_stuff,为什么我这样做只是为了gain and loss在您的数据中引入一些逻辑。

if you don't want a new column and you like the idea just replace qtycolumn with it like this:

如果您不想要一个新列并且您喜欢这个想法,只需qty像这样用它替换列:

df['qty'] = df.apply(lambda x: -x.qty if math.isnan(x.buy_price) else x.qty,axis=1)

next, I use this new column to create your position columnas follows:

接下来,我使用这个新列来创建您position column的如下:

df['position'] = df.groupby(['team','account','scrip'])['some_stuff'].cumsum()

which generates this column:

生成此列:

position
       2
       2
       2
       0
       1
      -1

bonus:

奖金:

if you want to delete the extra column some_stuffjust use:

如果你想删除额外的列,some_stuff只需使用:

del df['some_stuff']

Second Step:

第二步:

This is the step where you get your final grouped table with this line:

这是您使用此行获得最终分组表的步骤:

print(df.groupby(['team', 'account', 'scrip']).min())

final output:

最终输出:

                               time  buy_price  sell_price  qty  position
team  account scrip                                                      
team1 A1      FUT1   06/07/17 09:36       50.0        50.0    1         1
      A2      FUT1   06/07/17 09:46      100.0         NaN    2         2
team2 A3      FUT1   06/07/17 09:56       10.0        10.0    1        -1

I believe this answers your questions.

我相信这可以回答您的问题。

Documentation:

文档:

pandas.DataFrame.apply

pandas.DataFrame.apply

pandas.Groupby

pandas.Groupby

pandas.DataFrame.cumsum

pandas.DataFrame.cumsum

pandas.DataFrame.min

pandas.DataFrame.min

回答by Stael

is this what you're looking for?

这是你要找的吗?

df.groupby(['team', 'account', 'scrip']).min()

it gives me:

它给了我:

                      time  buy_price  sell_price  qty  position
team  account scrip                                             
team1 A1      FUT1   09:36       50.0        50.0    1         1
      A2      FUT1   09:46      100.0         NaN    2         2
team2 A3      FUT1   09:56       10.0        10.0    1        -1

that's a few more columns than you wanted but you can subset out what you're looking for.

这比您想要的多几列,但您可以细分出您要查找的内容。

(groupby by default moves the grouped columns to a multilevel index, but if this isn't what you want you can add as_index=Falseas an arg in the .groupby())

(默认情况下,groupby 将分组列移动到多级索引,但如果这不是您想要的,您可以在 中添加as_index=False为 arg .groupby()