Pandas DataFrame 中每月记录的平均每日计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33350381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Average of daily count of records per month in a Pandas DataFrame
提问by marillion
I have a pandas DataFrame with a TIMESTAMP
column, which is of the datetime64 data type. Please keep in mind, initially this column is not set as the index; the index is just regular integers, and the first few rows look like this:
我有一个带有一TIMESTAMP
列的Pandas DataFrame ,它是 datetime64 数据类型。请记住,最初此列未设置为索引;索引只是常规整数,前几行如下所示:
TIMESTAMP TYPE
0 2014-07-25 11:50:30.640 2
1 2014-07-25 11:50:46.160 3
2 2014-07-25 11:50:57.370 2
There is an arbitrary number of records for each day, and there may be days with no data. What I am trying to get is the average number of daily records per monththen plot it as a bar chart with months in the x-axis (April 2014, May 2014... etc.). I managed to calculate these values using the code below
每天有任意数量的记录,并且可能有几天没有数据。我想要得到的是每月的平均每日记录数,然后将其绘制为 x 轴(2014 年 4 月、2014 年 5 月……等)中月份为条形图的条形图。我设法使用下面的代码计算这些值
dfWIM.index = dfWIM.TIMESTAMP
for i in range(dfWIM.TIMESTAMP.dt.year.min(),dfWIM.TIMESTAMP.dt.year.max()+1):
for j in range(1,13):
print dfWIM[(dfWIM.TIMESTAMP.dt.year == i) & (dfWIM.TIMESTAMP.dt.month == j)].resample('D', how='count').TIMESTAMP.mean()
which gives the following output:
这给出了以下输出:
nan
nan
3100.14285714
6746.7037037
9716.42857143
10318.5806452
9395.56666667
9883.64516129
8766.03225806
9297.78571429
10039.6774194
nan
nan
nan
This is ok as it is, and with some more work, I can map to results to correct month names, then plot the bar chart. However, I am not sure if this is the correct/best way, and I suspect there might be an easier way to get the results using Pandas.
这很好,通过更多的工作,我可以将结果映射到正确的月份名称,然后绘制条形图。但是,我不确定这是否是正确/最好的方法,我怀疑使用 Pandas 可能有更简单的方法来获得结果。
I would be glad to hear what you think. Thanks!
我很高兴听到你的想法。谢谢!
NOTE:If I do not set the TIMESTAMP column as the index, I get a "reduction operation 'mean' not allowed for this dtype" error.
注意:如果我没有将 TIMESTAMP 列设置为索引,则会出现“此 dtype 不允许归约操作‘均值’”错误。
回答by jakevdp
I think you'll want to do two rounds of groupby
, first to group by day and count the instances, and next to group by month and compute the mean of the daily counts. You could do something like this.
我想你会想要做两轮groupby
,首先按天分组并计算实例,然后按月分组并计算每日计数的平均值。你可以做这样的事情。
First I'll generate some fake data that looks like yours:
首先,我会生成一些看起来像你的假数据:
import pandas as pd
# make 1000 random times throughout the year
N = 1000
times = pd.date_range('2014', '2015', freq='min')
ind = np.random.permutation(np.arange(len(times)))[:N]
data = pd.DataFrame({'TIMESTAMP': times[ind],
'TYPE': np.random.randint(0, 10, N)})
data.head()
Now I'll do the two groupbys using pd.TimeGrouper
and plot the monthly average counts:
现在我将使用两个 groupbyspd.TimeGrouper
并绘制月平均计数:
import seaborn as sns # for nice plot styles (optional)
daily = data.set_index('TIMESTAMP').groupby(pd.TimeGrouper(freq='D'))['TYPE'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
The formatting along the x axis leaves something to be desired, but you can tweak that if necessary.
沿 x 轴的格式设置有待改进,但您可以根据需要进行调整。