pandas 使用熊猫按日期计算值的频率

Question

提问by jcborges

Let's suppose I have following Time Series:

假设我有以下时间序列：

Timestamp              Category
2014-10-16 15:05:17    Facebook
2014-10-16 14:56:37    Vimeo
2014-10-16 14:25:16    Facebook
2014-10-16 14:15:32    Facebook
2014-10-16 13:41:01    Facebook
2014-10-16 12:50:30    Orkut
2014-10-16 12:28:54    Facebook
2014-10-16 12:26:56    Facebook
2014-10-16 12:25:12    Facebook
...
2014-10-08 15:52:49    Youtube
2014-10-08 15:04:50    Youtube
2014-10-08 15:03:48    Vimeo
2014-10-08 15:02:27    Youtube
2014-10-08 15:01:56    DailyMotion
2014-10-08 13:27:28    Facebook
2014-10-08 13:01:08    Vimeo
2014-10-08 12:52:06    Facebook
2014-10-08 12:43:27    Facebook
Name: summary, Length: 600

I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.

我想每周和每年计算每个类别（时间序列中的唯一值/因素）。

Example:

    Week/Year      Category      Count
    1/2014         Facebook      12
    1/2014         Google        5
    1/2014         Youtube       2
...    
    2/2014         Facebook      2
    2/2014         Google        5
    2/2014         Youtube       20
...

How can this be achieved using Python pandas?

如何使用 Python pandas 实现这一点？

Answer 1

采纳答案by Alex Riley

It might be easiest to turn your Series into a DataFrame and use Pandas' groupbyfunctionality (if you already have a DataFrame then skip straight to adding another column below).

将您的系列转换为 DataFrame 并使用 Pandas 的groupby功能可能是最简单的（如果您已经有了 DataFrame，则直接跳到下面添加另一列）。

If your Series is called s, then turn it into a DataFrame like so:

如果您的 Series 被称为s，则将其转换为 DataFrame ，如下所示：

>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
       Category           Timestamp
0      Facebook 2014-10-16 15:05:17
1         Vimeo 2014-10-16 14:56:37
2      Facebook 2014-10-16 14:25:16
...

Now add another column for the week and year (one way is to use applyand generate a string of the week/year numbers):

现在为周和年添加另一列（一种方法是使用apply并生成一串周/年数字）：

>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
             Timestamp     Category Week/Year
0  2014-10-16 15:05:17     Facebook   42/2014
1  2014-10-16 14:56:37        Vimeo   42/2014
2  2014-10-16 14:25:16     Facebook   42/2014
...

Finally, group by 'Week/Year'and 'Category'and aggregate with size()to get the counts. For the data in your question this produces the following:

最后，按'Week/Year'和分组'Category'和聚合size()以得到计数。对于您问题中的数据，这会产生以下结果：

>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year  Category   
41/2014    DailyMotion    1
           Facebook       3
           Vimeo          2
           Youtube        3
42/2014    Facebook       7
           Orkut          1
           Vimeo          1

Answer 2

回答by Leon

To be a little bit more clear, you do not need to create a new column called 'week_num' first.

更清楚一点，您不需要先创建一个名为“week_num”的新列。

df.groupby(by=lambda x: "%d/%d" % (x.week(), x.year())).Category.value_counts()

The function by will automatically call on each timestamp object of the index to convert them to week and year, and then group by the week and year.

by 函数会自动调用索引的每个时间戳对象，将它们转换为周和年，然后按周和年分组。

Answer 3

回答by cwharland

Convert your TimeStamp column to week number then groupby that week number and value_countthe categorical variable like so:

将您的 TimeStamp 列转换为周数，然后按该周数和value_count分类变量进行分组，如下所示：

df.groupby('week_num').Category.value_counts()

Where I have assumed that a new column week_numwas created from the TimeStamp column.

我假设week_num从 TimeStamp 列创建了一个新列。

pandas 使用熊猫按日期计算值的频率

提问by jcborges

采纳答案by Alex Riley

回答by Leon

回答by cwharland

相关推荐

最近更新

标签

pandas 使用熊猫按日期计算值的频率

提问by jcborges

采纳答案by Alex Riley

回答by Leon

回答by cwharland

相关推荐

vba 在 excel 2010 的下拉列表中创建一个复选框

如何检查哪一行 VBA 代码导致错误

vba 单击保存按钮时创建新记录而不是覆盖 MS-Access

重新链接数据库表：Access、VBA

相关推荐

最近更新

标签