Pandas:将 TimeGrouper 与另一个 Groupby 参数结合起来
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16982370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Combine TimeGrouper with another Groupby argument
提问by Andy
I have the following DataFrame:
我有以下数据帧:
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,1,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,2,10,0),
DT.datetime(2013,12,2,12,0),
DT.datetime(2013,12,2,14,0),
]})
from pandas.tseries.resample import TimeGrouper
How can I group this data by the Branch and on a 20 day period using TimeGrouper?
如何使用 TimeGrouper 按分支和 20 天的时间段对这些数据进行分组?
All my previous attempts failed, because I could not combine TimeGrouper with another argument in the groupby function.
我之前的所有尝试都失败了,因为我无法将 TimeGrouper 与 groupby 函数中的另一个参数结合起来。
I would deeply appreciate your help.
我将非常感谢您的帮助。
Thank you
谢谢
Andy
安迪
采纳答案by Jeff
From the discussion here: https://github.com/pydata/pandas/issues/3791
从这里的讨论:https: //github.com/pydata/pandas/issues/3791
In [38]: df.set_index('Date').groupby(pd.TimeGrouper('6M')).apply(lambda x: x.groupby('Branch').sum())
Out[38]:
Quantity
Branch
2013-01-31 A 4
2014-01-31 A 22
B 3
And a bit more complicated question
还有一个更复杂的问题
In [55]: def testf(df):
....: if (df['Buyer'] == 'Mark').sum() > 0:
....: return Series(dict(quantity = df['Quantity'].sum(), buyer = 'mark'))
....: return Series(dict(quantity = df['Quantity'].sum()*100, buyer = 'other'))
....:
In [56]: df.set_index('Date').groupby(pd.TimeGrouper('6M')).apply(lambda x: x.groupby('Branch').apply(testf))
Out[56]:
buyer quantity
Branch
2013-01-31 A mark 4
2014-01-31 A other 2200
B other 300
回答by Andy Hayden
You can now use a TimeGrouper with another column (as of IIRCpandas version 0.14):
您现在可以将 TimeGrouper 与另一列一起使用(从IIRCpandas版本 0.14 开始):
In [11]: df1 = df.set_index('Date')
In [12]: g = df1.groupby([pd.TimeGrouper('20D'), 'Branch'])
In [13]: g.sum()
Out[13]:
Quantity
Date Branch
2013-01-01 13:00:00 A 4
2013-09-18 13:00:00 A 13
2013-11-17 13:00:00 A 9
B 3

