pandas 多级熊猫分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45052636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Multi level pandas groupby
提问by pythonRcpp
I need to maintain position of "each scrip per team per account". So I think pandas groupby
might be used. I have tried to state my problem in comment of dataframe aswell. The position column has been added manually by me.
我需要保持“每个帐户每个团队每个脚本”的位置。所以我认为groupby
可能会使用Pandas。我也试图在数据框的评论中说明我的问题。位置栏是我手动添加的。
Initial df I have
我有初始 df
time account scrip buy_price sell_price qty team
0 06/07/17 09:36 A1 FUT1 50.0 NaN 2 team1
1 06/07/17 09:46 A2 FUT1 100.0 NaN 2 team1
2 06/07/17 09:56 A3 FUT1 10.0 NaN 2 team2
3 06/07/17 09:57 A3 FUT1 NaN 10.0 2 team2
4 06/07/17 09:58 A1 FUT1 NaN 50.0 1 team1
5 06/07/17 09:59 A3 FUT1 NaN 50.0 1 team2
I need to add position. The position column has been added manually by me and in the comment I write additional remark for clarity on how position is calculated.
我需要添加位置。位置列是我手动添加的,在评论中我写了额外的注释,以清楚地说明位置是如何计算的。
time account scrip buy_price sell_price qty team position comment
0 06/07/17 09:36 A1 FUT1 50.0 NaN 2 team1 2 this can only be sold by team1 account A1
1 06/07/17 09:46 A2 FUT1 100.0 NaN 2 team1 2 this can only be sold by team1 account A2
2 06/07/17 09:56 A3 FUT1 10.0 NaN 2 team2 2 this can only be sold by team2 account A3
3 06/07/17 09:57 A3 FUT1 NaN 10.0 2 team2 0 sold by team 2 in acc A3
4 06/07/17 09:58 A1 FUT1 NaN 50.0 1 team1 1 sold by team 1 in acc A1
5 06/07/17 09:59 A3 FUT1 NaN 50.0 1 team2 -1 sold by team 2 in acc A3
The above example is for just 1 scrip FUT1, there would be many scrips. My end result would be something similar to.
上面的例子只针对 1 个脚本 FUT1,会有很多个脚本。我的最终结果将类似于。
Team Account Scrip Position
team1 A1 FUT1 1
A2 FUT1 2
team2 A3 FUT1 -1
End result can be worked on later, once position is calculated successfully.
My approach: making a unique key in each row so as to know when to + or - position. eg. A1_FUT1_team1
for row1
A2_FUT1_team1
for row2
. Then add subtract matching keys. Is this anyway a good approach?
一旦位置计算成功,最终结果可以稍后处理。我的方法:在每一行中制作一个唯一的键,以便知道何时 + 或 - 位置。例如。A1_FUT1_team1
为row1
A2_FUT1_team1
对row2
。然后添加减匹配键。无论如何,这是一个好方法吗?
回答by Rayhane Mama
Your problem can be solved easily in two steps:
您的问题可以通过两个步骤轻松解决:
First Step:
第一步:
import math
df['some_stuff'] = df.apply(lambda x: -x.qty if math.isnan(x.buy_price) else x.qty,axis=1)
this line is creating a new column some_stuff
, why I did this is just to introduce some logic of gain and loss
in your data.
这一行正在创建一个新列some_stuff
,为什么我这样做只是为了gain and loss
在您的数据中引入一些逻辑。
if you don't want a new column and you like the idea just replace qty
column with it like this:
如果您不想要一个新列并且您喜欢这个想法,只需qty
像这样用它替换列:
df['qty'] = df.apply(lambda x: -x.qty if math.isnan(x.buy_price) else x.qty,axis=1)
next, I use this new column to create your position column
as follows:
接下来,我使用这个新列来创建您position column
的如下:
df['position'] = df.groupby(['team','account','scrip'])['some_stuff'].cumsum()
which generates this column:
生成此列:
position
2
2
2
0
1
-1
bonus:
奖金:
if you want to delete the extra column some_stuff
just use:
如果你想删除额外的列,some_stuff
只需使用:
del df['some_stuff']
Second Step:
第二步:
This is the step where you get your final grouped table with this line:
这是您使用此行获得最终分组表的步骤:
print(df.groupby(['team', 'account', 'scrip']).min())
final output:
最终输出:
time buy_price sell_price qty position
team account scrip
team1 A1 FUT1 06/07/17 09:36 50.0 50.0 1 1
A2 FUT1 06/07/17 09:46 100.0 NaN 2 2
team2 A3 FUT1 06/07/17 09:56 10.0 10.0 1 -1
I believe this answers your questions.
我相信这可以回答您的问题。
Documentation:
文档:
回答by Stael
is this what you're looking for?
这是你要找的吗?
df.groupby(['team', 'account', 'scrip']).min()
it gives me:
它给了我:
time buy_price sell_price qty position
team account scrip
team1 A1 FUT1 09:36 50.0 50.0 1 1
A2 FUT1 09:46 100.0 NaN 2 2
team2 A3 FUT1 09:56 10.0 10.0 1 -1
that's a few more columns than you wanted but you can subset out what you're looking for.
这比您想要的多几列,但您可以细分出您要查找的内容。
(groupby by default moves the grouped columns to a multilevel index, but if this isn't what you want you can add as_index=False
as an arg in the .groupby()
)
(默认情况下,groupby 将分组列移动到多级索引,但如果这不是您想要的,您可以在 中添加as_index=False
为 arg .groupby()
)