pandas 多级熊猫分组

Question

提问by pythonRcpp

I need to maintain position of "each scrip per team per account". So I think pandas groupbymight be used. I have tried to state my problem in comment of dataframe aswell. The position column has been added manually by me.

我需要保持“每个帐户每个团队每个脚本”的位置。所以我认为groupby可能会使用Pandas。我也试图在数据框的评论中说明我的问题。位置栏是我手动添加的。

Initial df I have

我有初始 df

             time account scrip  buy_price  sell_price  qty   team  
0  06/07/17 09:36      A1  FUT1       50.0         NaN    2  team1         
1  06/07/17 09:46      A2  FUT1      100.0         NaN    2  team1         
2  06/07/17 09:56      A3  FUT1       10.0         NaN    2  team2         
3  06/07/17 09:57      A3  FUT1        NaN        10.0    2  team2         
4  06/07/17 09:58      A1  FUT1        NaN        50.0    1  team1         
5  06/07/17 09:59      A3  FUT1        NaN        50.0    1  team2

I need to add position. The position column has been added manually by me and in the comment I write additional remark for clarity on how position is calculated.

我需要添加位置。位置列是我手动添加的，在评论中我写了额外的注释，以清楚地说明位置是如何计算的。

             time account scrip  buy_price  sell_price  qty   team  position                                    comment
0  06/07/17 09:36      A1  FUT1       50.0         NaN    2  team1         2  this can only be sold by team1 account A1
1  06/07/17 09:46      A2  FUT1      100.0         NaN    2  team1         2  this can only be sold by team1 account A2
2  06/07/17 09:56      A3  FUT1       10.0         NaN    2  team2         2  this can only be sold by team2 account A3
3  06/07/17 09:57      A3  FUT1        NaN        10.0    2  team2         0                 sold by team 2 in  acc A3 
4  06/07/17 09:58      A1  FUT1        NaN        50.0    1  team1         1                  sold by team 1 in acc A1 
5  06/07/17 09:59      A3  FUT1        NaN        50.0    1  team2        -1                   sold by team 2 in acc A3

The above example is for just 1 scrip FUT1, there would be many scrips. My end result would be something similar to.

上面的例子只针对 1 个脚本 FUT1，会有很多个脚本。我的最终结果将类似于。

Team Account Scrip Position
team1 A1      FUT1 1 
      A2      FUT1 2
team2 A3      FUT1 -1

End result can be worked on later, once position is calculated successfully. My approach: making a unique key in each row so as to know when to + or - position. eg. A1_FUT1_team1for row1A2_FUT1_team1for row2. Then add subtract matching keys. Is this anyway a good approach?

一旦位置计算成功，最终结果可以稍后处理。我的方法：在每一行中制作一个唯一的键，以便知道何时 + 或 - 位置。例如。A1_FUT1_team1为row1A2_FUT1_team1对row2。然后添加减匹配键。无论如何，这是一个好方法吗？

Answer 1

回答by Rayhane Mama

Your problem can be solved easily in two steps:

您的问题可以通过两个步骤轻松解决：

First Step:

第一步：

import math
df['some_stuff'] = df.apply(lambda x: -x.qty if math.isnan(x.buy_price) else x.qty,axis=1)

this line is creating a new column some_stuff, why I did this is just to introduce some logic of gain and lossin your data.

这一行正在创建一个新列some_stuff，为什么我这样做只是为了gain and loss在您的数据中引入一些逻辑。

if you don't want a new column and you like the idea just replace qtycolumn with it like this:

如果您不想要一个新列并且您喜欢这个想法，只需qty像这样用它替换列：

df['qty'] = df.apply(lambda x: -x.qty if math.isnan(x.buy_price) else x.qty,axis=1)

next, I use this new column to create your position columnas follows:

接下来，我使用这个新列来创建您position column的如下：

df['position'] = df.groupby(['team','account','scrip'])['some_stuff'].cumsum()

which generates this column:

生成此列：

bonus:

奖金：

if you want to delete the extra column some_stuffjust use:

如果你想删除额外的列，some_stuff只需使用：

del df['some_stuff']

Second Step:

第二步：

This is the step where you get your final grouped table with this line:

这是您使用此行获得最终分组表的步骤：

print(df.groupby(['team', 'account', 'scrip']).min())

final output:

最终输出：

                               time  buy_price  sell_price  qty  position
team  account scrip                                                      
team1 A1      FUT1   06/07/17 09:36       50.0        50.0    1         1
      A2      FUT1   06/07/17 09:46      100.0         NaN    2         2
team2 A3      FUT1   06/07/17 09:56       10.0        10.0    1        -1

I believe this answers your questions.

我相信这可以回答您的问题。

Documentation:

文档：

pandas.DataFrame.apply

pandas.Groupby

pandas.DataFrame.cumsum

pandas.DataFrame.min

Answer 2

回答by Stael

is this what you're looking for?

这是你要找的吗？

df.groupby(['team', 'account', 'scrip']).min()

it gives me:

它给了我：

                      time  buy_price  sell_price  qty  position
team  account scrip                                             
team1 A1      FUT1   09:36       50.0        50.0    1         1
      A2      FUT1   09:46      100.0         NaN    2         2
team2 A3      FUT1   09:56       10.0        10.0    1        -1

that's a few more columns than you wanted but you can subset out what you're looking for.

这比您想要的多几列，但您可以细分出您要查找的内容。

(groupby by default moves the grouped columns to a multilevel index, but if this isn't what you want you can add as_index=Falseas an arg in the .groupby())

（默认情况下，groupby 将分组列移动到多级索引，但如果这不是您想要的，您可以在中添加as_index=False为 arg .groupby()）

pandas 多级熊猫分组

提问by pythonRcpp

回答by Rayhane Mama

回答by Stael

相关推荐

最近更新

标签

pandas 多级熊猫分组

提问by pythonRcpp

回答by Rayhane Mama

回答by Stael

相关推荐

错误：无法在 Pandas 中将浮点 NaN 转换为整数

使用 Python pandas 获取数据帧的所有行

Pandas 相当于 SQL case when 语句创建新变量

使用 MySQL 的 Pandas 0.20.2 to_sql()

相关推荐

最近更新

标签