pandas 熊猫用 sum 对重复的索引求和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35403752/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas sum over duplicated indices with sum
提问by Pat
I have a data frame indexed by date
我有一个按日期索引的数据框
transactions_ind
Out[25]:
Ticker Transaction Number_of_units Price
Date
2012-10-11 ROG VX Equity Buy 12000 182.00000
2012-10-16 ROG VX Equity Sell -5000 184.70000
2012-11-16 ROG VX Equity Sell -5000 175.51580
2012-12-07 ROG VX Equity Buy 5000 184.90000
2012-12-11 ROG VX Equity Sell -3000 188.50000
2012-12-11 ROG VX Equity Reversal: Sell 3000 188.50000
2012-12-11 ROG VX Equity Sell -3000 188.50000
2012-12-11 ROG VX Equity Reversal: Sell 3000 188.50000
2012-12-11 ROG VX Equity Sell -3000 188.50000
2012-12-20 ROG VX Equity Sell -5000 185.80000
I want to sum over the duplicated index values (2012-12-11) but only over the column "Number_of_units".
我想对重复的索引值 (2012-12-11) 求和,但只对“Number_of_units”列进行求和。
transactions_ind
Out[25]:
Ticker Transaction Number_of_units Price
Date
2012-10-11 ROG VX Equity Buy 12000 182.00000
2012-10-16 ROG VX Equity Sell -5000 184.70000
2012-11-16 ROG VX Equity Sell -5000 175.51580
2012-12-07 ROG VX Equity Buy 5000 184.90000
2012-12-11 ROG VX Equity Sell -3000 188.50000
2012-12-20 ROG VX Equity Sell -5000 185.80000
Using
使用
transactions_ind.groupby(transactions_ind.index).sum()
deletes the columns "Ticker" and "Transaction" since those are filled with non-numeric values. Also I would olike to know how to deal with the different strings in the "Transactions" column when I sum over the "Number_of_units" column. Hope there exists a one-liner in pandas. Thanks for your help!
删除列“Ticker”和“Transaction”,因为它们填充了非数字值。另外,当我对“Number_of_units”列求和时,我想知道如何处理“Transactions”列中的不同字符串。希望Pandas中存在单线。谢谢你的帮助!
回答by jezrael
You can use agg
with first
and sum
:
df = df.groupby(df.index).agg({'Ticker': 'first',
'Transaction': 'first',
'Number_of_units':sum,
'Price': 'first'})
#reorder columns
df = df[['Ticker','Transaction','Number_of_units','Price']]
print df
Ticker Transaction Number_of_units Price
Date
2012-10-11 ROG VX Equity Buy 12000 182.0000
2012-10-16 ROG VX Equity Sell -5000 184.7000
2012-11-16 ROG VX Equity Sell -5000 175.5158
2012-12-07 ROG VX Equity Buy 5000 184.9000
2012-12-11 ROG VX Equity Sell -3000 188.5000
2012-12-20 ROG VX Equity Sell -5000 185.8000