计算 Pandas 时间序列上的每日事件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18922760/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting daily events on Pandas Time series
提问by fccoelho
Hi I have a time series and would like to count how many events I have per day(i.e. rows in the table within a day). The command I'd like to use is:
嗨,我有一个时间序列,想计算我每天有多少事件(即一天内表格中的行)。我想使用的命令是:
ts.resample('D', how='count')
but "count" is not a valid aggregation function for time series, I suppose.
但我想“计数”不是时间序列的有效聚合函数。
just to clarify, here is a sample of the dataframe:
只是为了澄清,这里是数据框的示例:
0 2008-02-22 03:43:00
1 2008-02-22 03:43:00
2 2010-08-05 06:48:00
3 2006-02-07 06:40:00
4 2005-06-06 05:04:00
5 2008-04-17 02:11:00
6 2012-05-12 06:46:00
7 2004-05-17 08:42:00
8 2004-08-02 05:02:00
9 2008-03-26 03:53:00
Name: Data_Hora, dtype: datetime64[ns]
and this is the error I am getting:
这是我得到的错误:
ts.resample('D').count()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-86643e21ce18> in <module>()
----> 1 ts.resample('D').count()
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
255 def resample(self, rule, how=None, axis=0, fill_method=None,
256 closed=None, label=None, convention='start',
--> 257 kind=None, loffset=None, limit=None, base=0):
258 """
259 Convenience method for frequency conversion and resampling of regular
/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
98 return obj
99 else: # pragma: no cover
--> 100 raise TypeError('Only valid with DatetimeIndex or PeriodIndex')
101
102 rs_axis = rs._get_axis(self.axis)
TypeError: Only valid with DatetimeIndex or PeriodIndex
That can be fixed by turning the datetime column into an index with set_index. However after I do that, I still get the following error:
这可以通过将日期时间列转换为带有 set_index 的索引来解决。但是在我这样做之后,我仍然收到以下错误:
DataError: No numeric types to aggregate
because my Dataframe does not have a numeric column.
因为我的 Dataframe 没有数字列。
But I just want to count rows!! The simple "select count(*) group by ... " from SQL.
但我只想计算行数!!来自 SQL 的简单“select count(*) group by ...”。
回答by fccoelho
In order to get this to work, after removing the rows in which the index was NaT:
为了使其工作,在删除索引为 NaT 的行后:
df2 = df[df.index!=pd.NaT]
I had to add a column of ones:
我不得不添加一列:
df2['n'] = 1
and then count only that column:
然后只计算该列:
df2.n.resample('D', how="sum")
then I could visualize the data with:
然后我可以使用以下方法可视化数据:
plot(df2.n.resample('D', how="sum"))
回答by Jeff
In [104]: df = DataFrame(1,index=date_range('20130101 9:01',freq='h',periods=1000),columns=['A'])
In [105]: df
Out[105]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2013-01-01 09:01:00 to 2013-02-12 00:01:00
Freq: H
Data columns (total 1 columns):
A 1000 non-null values
dtypes: int64(1)
In [106]: df.resample('D').count()
Out[106]:
A 43
dtype: int64
回答by vndrewlee
You can do this with a one liner, using value counts and resampling.
您可以使用单衬,使用值计数和重新采样来做到这一点。
Assuming your DataFrame is named df:
假设您的 DataFrame 被命名为df:
df.index.value_counts().resample('D', how='sum')
df.index.value_counts().resample('D', how='sum')
This method also works if datetime is not your index:
如果 datetime 不是您的索引,此方法也适用:
df.any_datetime_series.value_counts().resample('D', how='sum')
df.any_datetime_series.value_counts().resample('D', how='sum')

