pandas 使用熊猫按日期计算值的频率

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27823273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:37:59  来源:igfitidea点击:

Counting frequency of values by date using pandas

pandasdatetimedataframecounttime-series

提问by jcborges

Let's suppose I have following Time Series:

假设我有以下时间序列:

Timestamp              Category
2014-10-16 15:05:17    Facebook
2014-10-16 14:56:37    Vimeo
2014-10-16 14:25:16    Facebook
2014-10-16 14:15:32    Facebook
2014-10-16 13:41:01    Facebook
2014-10-16 12:50:30    Orkut
2014-10-16 12:28:54    Facebook
2014-10-16 12:26:56    Facebook
2014-10-16 12:25:12    Facebook
...
2014-10-08 15:52:49    Youtube
2014-10-08 15:04:50    Youtube
2014-10-08 15:03:48    Vimeo
2014-10-08 15:02:27    Youtube
2014-10-08 15:01:56    DailyMotion
2014-10-08 13:27:28    Facebook
2014-10-08 13:01:08    Vimeo
2014-10-08 12:52:06    Facebook
2014-10-08 12:43:27    Facebook
Name: summary, Length: 600

I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.

我想每周和每年计算每个类别(时间序列中的唯一值/因素)。

Example:

    Week/Year      Category      Count
    1/2014         Facebook      12
    1/2014         Google        5
    1/2014         Youtube       2
...    
    2/2014         Facebook      2
    2/2014         Google        5
    2/2014         Youtube       20
...

How can this be achieved using Python pandas?

如何使用 Python pandas 实现这一点?

采纳答案by Alex Riley

It might be easiest to turn your Series into a DataFrame and use Pandas' groupbyfunctionality (if you already have a DataFrame then skip straight to adding another column below).

将您的系列转换为 DataFrame 并使用 Pandas 的groupby功能可能是最简单的(如果您已经有了 DataFrame,则直接跳到下面添加另一列)。

If your Series is called s, then turn it into a DataFrame like so:

如果您的 Series 被称为s,则将其转换为 DataFrame ,如下所示:

>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
       Category           Timestamp
0      Facebook 2014-10-16 15:05:17
1         Vimeo 2014-10-16 14:56:37
2      Facebook 2014-10-16 14:25:16
...

Now add another column for the week and year (one way is to use applyand generate a string of the week/year numbers):

现在为周和年添加另一列(一种方法是使用apply并生成一串周/年数字):

>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
             Timestamp     Category Week/Year
0  2014-10-16 15:05:17     Facebook   42/2014
1  2014-10-16 14:56:37        Vimeo   42/2014
2  2014-10-16 14:25:16     Facebook   42/2014
...

Finally, group by 'Week/Year'and 'Category'and aggregate with size()to get the counts. For the data in your question this produces the following:

最后,按'Week/Year'和分组'Category'和聚合size()以得到计数。对于您问题中的数据,这会产生以下结果:

>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year  Category   
41/2014    DailyMotion    1
           Facebook       3
           Vimeo          2
           Youtube        3
42/2014    Facebook       7
           Orkut          1
           Vimeo          1

回答by Leon

To be a little bit more clear, you do not need to create a new column called 'week_num' first.

更清楚一点,您不需要先创建一个名为“week_num”的新列。

df.groupby(by=lambda x: "%d/%d" % (x.week(), x.year())).Category.value_counts()

The function by will automatically call on each timestamp object of the index to convert them to week and year, and then group by the week and year.

by 函数会自动调用索引的每个时间戳对象,将它们转换为周和年,然后按周和年分组。

回答by cwharland

Convert your TimeStamp column to week number then groupby that week number and value_countthe categorical variable like so:

将您的 TimeStamp 列转换为周数,然后按该周数和value_count分类变量进行分组,如下所示:

df.groupby('week_num').Category.value_counts()

Where I have assumed that a new column week_numwas created from the TimeStamp column.

我假设week_num从 TimeStamp 列创建了一个新列。