在 Pandas 中使用 groupby 的 TimeSeries

Question

提问by mikebmassey

I would like to look at TimeSeriesdata for every client over various time periods in Pandas.

我想查看TimeSeries不同时间段内每个客户的数据Pandas。

import pandas as pd
import numpy as np
import random
clients = np.random.randint(1, 11, size=100)
dates = pd.date_range('20130101',periods=365)
OrderDates = random.sample(dates,100)
Values = np.random.randint(10, 250, size=100)


df = pd.DataFrame({ 'Client' : clients,'OrderDate' : OrderDates, 'Value' : Values})

df = df.sort(['OrderDate', 'Client'], ascending=['True', 'True'])

df.head()

What I am trying to accomplish is to get the count and the sum of the 'Value' column, grouped by 'Client' for various time periods (Monthly, Quarterly, Yearly - I will likely build 3 different dataframes for this data, then make the dataframes 'wide').

我想要完成的是获取“值”列的计数和总和，按“客户”分组，用于不同时间段（每月、每季度、每年 - 我可能会为此数据构建 3 个不同的数据框，然后使数据帧“宽”）。

For Quarterly, I would expect something like this:

对于季度，我希望是这样的：

Client      OrderDate       NumberofEntries SumofValues
1           2013-03-31      7               28
1           2013-06-30      2               7
1           2013-09-30      6               20
1           2013-12-31      1               3
2           2013-03-31      1               4
2           2013-06-30      2               8
2           2013-09-30      3               17
2           2013-12-31      4               24

I could append that data frame by getting the quarter for each entry (or Month, or Year), then use Pandasgroupbyfunction, but that seems like it's extra work when I should be using TimeSeries.

我可以通过获取每个条目（或月或年）的季度来附加该数据框，然后使用Pandasgroupby函数，但是当我应该使用TimeSeries.

I've read the documentation and reviewed a TimeSeriesdemonstration by Wes, but I don't see a way to do a groupbyfor the Client, then perform the TimeSeriesover the time periods I am trying to build (Alternatively - I could run a for loopand build the dataframe that way, but again - seems like that's more work than there should be.)

我已经阅读了文档并查看了TimeSeriesWes的演示，但我没有看到groupby为客户端做一个的方法，然后在TimeSeries我尝试构建的时间段内执行（或者 - 我可以运行一个for loop并构建数据框那样，但同样 - 似乎这比应该做的工作要多。）

Is there a way to combine a groupbyprocess with TimeSeries?

有没有办法将一个groupby过程与TimeSeries?

Answer 1

回答by Andy Hayden

A slight alternative is to set_indexbefore doing the groupby:

一个轻微的替代方法是set_index在进行 groupby 之前：

In [11]: df.set_index('OrderDate', inplace=True)

In [12]: g = df.groupby('Client')

In [13]: g['Value'].resample('Q', how=[np.sum, len])
Out[13]: 
                   sum  len
Client OrderDate           
1      2013-03-31  239    1
       2013-06-30   83    1
       2013-09-30  249    2
       2013-12-31  506    3
2      2013-03-31  581    4
       2013-06-30  569    4
       2013-09-30  316    4
       2013-12-31  465    5
...

Note: you don't need to do the sort before doing this.

注意：在执行此操作之前您不需要进行排序。

Answer 2

回答by joris

Something like this? I'm first doing a groupby, and then applying a resample on each group.

像这样的东西？我首先进行分组，然后对每个组应用重新采样。

In [11]: grouped = df.groupby('Client')

In [12]: result = grouped.apply(lambda x: x.set_index('OrderDate').resample('Q', how=[np.sum, len]))

In [13]: result['Value']
Out[13]: 
                    sum  len
Client OrderDate            
1      2013-03-31   227    4
       2013-06-30   344    2
       2013-09-30   234    1
2      2013-03-31   299    2
       2013-06-30   538    4
       2013-09-30   236    2
       2013-12-31  1124    7
3      2013-03-31   496    4
       2013-06-30   NaN    0
       2013-09-30   167    2
       2013-12-31   218    1

Update: with the suggestion of @AndyHayden in his answer, this becomes much cleaner code:

更新：根据@AndyHayden 在他的回答中的建议，这将变得更加清晰：

df = df.set_index('OrderDate')
grouped = df.groupby('Client')
grouped['Value'].resample('Q', how=[np.sum, len])

在 Pandas 中使用 groupby 的 TimeSeries

提问by mikebmassey

回答by Andy Hayden

回答by joris

相关推荐

最近更新

标签

在 Pandas 中使用 groupby 的 TimeSeries

提问by mikebmassey

回答by Andy Hayden

回答by joris

相关推荐

如何使用 Pandas 将 Series 连接到 DataFrame 上？

pandas 重命名多索引数据框中的索引值

pandas 从 DataFrame 中减去一个系列，同时保持 DataFrame 结构完整

pandas 如何确定 matplotlib 条形图中的条形顺序

相关推荐

最近更新

标签