跨行 Pandas Dataframe 求和

Question

提问by Joop

I have a DataFrame of records that looks something like this:

我有一个 DataFrame 记录，看起来像这样：

stocks = pd.Series(['A', 'A', 'B', 'C', 'C'], name = 'stock')
positions = pd.Series([ 100, 200, 300, 400, 500], name = 'positions')
same1 = pd.Series(['AA', 'AA', 'BB', 'CC', 'CC'], name = 'same1')
same2 = pd.Series(['AAA', 'AAA', 'BBB', 'CCC', 'CCC'], name = 'same2')
diff = pd.Series(['A1', 'A2', 'B3' ,'C1', 'C2'], name = 'different')
df = pd.DataFrame([stocks, same1, positions, same2, diff]).T
df

This gives a pandas DataFrame that looks like

这给出了一个看起来像的 Pandas DataFrame

      stock same1 positions same2 different
0     A    AA       100   AAA        A1
1     A    AA       200   AAA        A2
2     B    BB       300   BBB        B3
3     C    CC       400   CCC        C1
4     C    CC       500   CCC        C2

I'm not interested in the data in 'different' columns and want to sum the positions along the unique other columns. I am currently doing it by:

我对“不同”列中的数据不感兴趣，并且想对其他唯一列中的位置求和。我目前正在这样做：

df.groupby(['stock','same1','same2'])['positions'].sum()

which gives:

这使：

stock  same1  same2
A      AA     AAA      300
B      BB     BBB      300
C      CC     CCC      900
Name: positions

Problem is that this is a pd.Series (with Multi-Index). Currently I iterate over it to build a DataFrame again. I am sure that I am missing a method. Basically I want to drop 1 column from a DataFrame and then "rebuild it" so that one column is summed and the rest of the fields (which are the same) stay in place.

问题是这是一个 pd.Series（带有多索引）。目前我迭代它以再次构建一个DataFrame。我确定我缺少一种方法。基本上，我想从 DataFrame 中删除 1 列，然后“重建它”，以便对一列求和，其余字段（相同）保持原位。

This groupby method breaks if there are empty positions. So I currently use an elaborate iteration over the DataFrame to build a new one. Is there a better approach?

如果有空仓位，这个 groupby 方法就会中断。所以我目前使用对 DataFrame 的精心迭代来构建一个新的。有没有更好的方法？

Answer 1

采纳答案by waitingkuo

Step 1. Use [['positions']] instead of ['positions']:

步骤 1. 使用 [['positions']] 代替 ['positions']：

In [30]: df2 = df.groupby(['stock','same1','same2'])[['positions']].sum()

In [31]: df2 
Out[31]: 

                   positions
stock same1 same2               
A     AA    AAA          300 
B     BB    BBB          300 
C     CC    CCC          900

Step 2. And then use reset_indexto move the index back to the column

Step 2. 然后使用reset_index将索引移回列

In [34]: df2.reset_index()
Out[34]: 
  stock same1 same2  positions
0     A    AA   AAA        300 
1     B    BB   BBB        300 
2     C    CC   CCC        900

EDIT

编辑

Seems my method is not so good.

看来我的方法不是很好。

Thanks to @Andy and @unutbu , you can achieve your goal by more elegant ways:

感谢 @Andy 和 @unutbu ，您可以通过更优雅的方式实现您的目标：

method 1:

方法一：

df.groupby(['stock', 'same1', 'same2'])['positions'].sum().reset_index()

method 2:

方法二：

df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()

跨行 Pandas Dataframe 求和

提问by Joop

采纳答案by waitingkuo

EDIT

编辑

相关推荐

最近更新

标签

跨行 Pandas Dataframe 求和

提问by Joop

采纳答案by waitingkuo

EDIT

编辑

相关推荐

pandas 如何使用python pandas从数据框中删除重复的列

pandas 熊猫 read_csv dtype 前导零

pandas 合并熊猫中的两个时间序列

在 Pandas 数据框中获得几年内工作日某个小时的平均值

相关推荐

最近更新

标签