计算 Pandas 时间序列上的每日事件

Question

提问by fccoelho

Hi I have a time series and would like to count how many events I have per day(i.e. rows in the table within a day). The command I'd like to use is:

嗨，我有一个时间序列，想计算我每天有多少事件（即一天内表格中的行）。我想使用的命令是：

ts.resample('D', how='count')

but "count" is not a valid aggregation function for time series, I suppose.

但我想“计数”不是时间序列的有效聚合函数。

just to clarify, here is a sample of the dataframe:

只是为了澄清，这里是数据框的示例：

0   2008-02-22 03:43:00
1   2008-02-22 03:43:00
2   2010-08-05 06:48:00
3   2006-02-07 06:40:00
4   2005-06-06 05:04:00
5   2008-04-17 02:11:00
6   2012-05-12 06:46:00
7   2004-05-17 08:42:00
8   2004-08-02 05:02:00
9   2008-03-26 03:53:00
Name: Data_Hora, dtype: datetime64[ns]

and this is the error I am getting:

这是我得到的错误：

ts.resample('D').count()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-86643e21ce18> in <module>()
----> 1 ts.resample('D').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255     def resample(self, rule, how=None, axis=0, fill_method=None,
    256                  closed=None, label=None, convention='start',
--> 257                  kind=None, loffset=None, limit=None, base=0):
    258         """
    259         Convenience method for frequency conversion and resampling of regular

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     98             return obj
     99         else:  # pragma: no cover
--> 100             raise TypeError('Only valid with DatetimeIndex or PeriodIndex')
    101 
    102         rs_axis = rs._get_axis(self.axis)

TypeError: Only valid with DatetimeIndex or PeriodIndex

That can be fixed by turning the datetime column into an index with set_index. However after I do that, I still get the following error:

这可以通过将日期时间列转换为带有 set_index 的索引来解决。但是在我这样做之后，我仍然收到以下错误：

DataError: No numeric types to aggregate

because my Dataframe does not have a numeric column.

因为我的 Dataframe 没有数字列。

But I just want to count rows!! The simple "select count(*) group by ... " from SQL.

但我只想计算行数！！来自 SQL 的简单“select count(*) group by ...”。

Answer 1

回答by fccoelho

In order to get this to work, after removing the rows in which the index was NaT:

为了使其工作，在删除索引为 NaT 的行后：

df2 = df[df.index!=pd.NaT]

I had to add a column of ones:

我不得不添加一列：

df2['n'] = 1

and then count only that column:

然后只计算该列：

df2.n.resample('D', how="sum")

then I could visualize the data with:

然后我可以使用以下方法可视化数据：

plot(df2.n.resample('D', how="sum"))

Answer 2

回答by Jeff

In [104]: df = DataFrame(1,index=date_range('20130101 9:01',freq='h',periods=1000),columns=['A'])

In [105]: df
Out[105]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2013-01-01 09:01:00 to 2013-02-12 00:01:00
Freq: H
Data columns (total 1 columns):
A    1000  non-null values
dtypes: int64(1)

In [106]: df.resample('D').count()
Out[106]: 
A    43
dtype: int64

Answer 3

回答by vndrewlee

You can do this with a one liner, using value counts and resampling.

您可以使用单衬，使用值计数和重新采样来做到这一点。

Assuming your DataFrame is named df:

假设您的 DataFrame 被命名为df：

df.index.value_counts().resample('D', how='sum')

This method also works if datetime is not your index:

如果 datetime 不是您的索引，此方法也适用：

df.any_datetime_series.value_counts().resample('D', how='sum')

计算 Pandas 时间序列上的每日事件

提问by fccoelho

回答by fccoelho

回答by Jeff

回答by vndrewlee

相关推荐

最近更新

标签

计算 Pandas 时间序列上的每日事件

提问by fccoelho

回答by fccoelho

回答by Jeff

回答by vndrewlee

相关推荐

Norm along row in pandas

pandas Python 中的回归

通过从每行的不同列中选择一个元素，从 Pandas DataFrame 创建一个系列

pandas 如何在 Python 中读取大文本文件？

相关推荐

最近更新

标签