跨行 Pandas Dataframe 求和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17166601/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Summing across rows of Pandas Dataframe
提问by Joop
I have a DataFrame of records that looks something like this:
我有一个 DataFrame 记录,看起来像这样:
stocks = pd.Series(['A', 'A', 'B', 'C', 'C'], name = 'stock')
positions = pd.Series([ 100, 200, 300, 400, 500], name = 'positions')
same1 = pd.Series(['AA', 'AA', 'BB', 'CC', 'CC'], name = 'same1')
same2 = pd.Series(['AAA', 'AAA', 'BBB', 'CCC', 'CCC'], name = 'same2')
diff = pd.Series(['A1', 'A2', 'B3' ,'C1', 'C2'], name = 'different')
df = pd.DataFrame([stocks, same1, positions, same2, diff]).T
df
This gives a pandas DataFrame that looks like
这给出了一个看起来像的 Pandas DataFrame
stock same1 positions same2 different
0 A AA 100 AAA A1
1 A AA 200 AAA A2
2 B BB 300 BBB B3
3 C CC 400 CCC C1
4 C CC 500 CCC C2
I'm not interested in the data in 'different' columns and want to sum the positions along the unique other columns. I am currently doing it by:
我对“不同”列中的数据不感兴趣,并且想对其他唯一列中的位置求和。我目前正在这样做:
df.groupby(['stock','same1','same2'])['positions'].sum()
which gives:
这使:
stock same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900
Name: positions
Problem is that this is a pd.Series (with Multi-Index). Currently I iterate over it to build a DataFrame again. I am sure that I am missing a method. Basically I want to drop 1 column from a DataFrame and then "rebuild it" so that one column is summed and the rest of the fields (which are the same) stay in place.
问题是这是一个 pd.Series(带有多索引)。目前我迭代它以再次构建一个DataFrame。我确定我缺少一种方法。基本上,我想从 DataFrame 中删除 1 列,然后“重建它”,以便对一列求和,其余字段(相同)保持原位。
This groupby method breaks if there are empty positions. So I currently use an elaborate iteration over the DataFrame to build a new one. Is there a better approach?
如果有空仓位,这个 groupby 方法就会中断。所以我目前使用对 DataFrame 的精心迭代来构建一个新的。有没有更好的方法?
采纳答案by waitingkuo
Step 1. Use [['positions']] instead of ['positions']:
步骤 1. 使用 [['positions']] 代替 ['positions']:
In [30]: df2 = df.groupby(['stock','same1','same2'])[['positions']].sum()
In [31]: df2
Out[31]:
positions
stock same1 same2
A AA AAA 300
B BB BBB 300
C CC CCC 900
Step 2. And then use reset_indexto move the index back to the column
Step 2. 然后使用reset_index将索引移回列
In [34]: df2.reset_index()
Out[34]:
stock same1 same2 positions
0 A AA AAA 300
1 B BB BBB 300
2 C CC CCC 900
EDIT
编辑
Seems my method is not so good.
看来我的方法不是很好。
Thanks to @Andy and @unutbu , you can achieve your goal by more elegant ways:
感谢 @Andy 和 @unutbu ,您可以通过更优雅的方式实现您的目标:
method 1:
方法一:
df.groupby(['stock', 'same1', 'same2'])['positions'].sum().reset_index()
method 2:
方法二:
df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()

