pandas 使用熊猫按日期计算值的频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27823273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting frequency of values by date using pandas
提问by jcborges
Let's suppose I have following Time Series:
假设我有以下时间序列:
Timestamp Category
2014-10-16 15:05:17 Facebook
2014-10-16 14:56:37 Vimeo
2014-10-16 14:25:16 Facebook
2014-10-16 14:15:32 Facebook
2014-10-16 13:41:01 Facebook
2014-10-16 12:50:30 Orkut
2014-10-16 12:28:54 Facebook
2014-10-16 12:26:56 Facebook
2014-10-16 12:25:12 Facebook
...
2014-10-08 15:52:49 Youtube
2014-10-08 15:04:50 Youtube
2014-10-08 15:03:48 Vimeo
2014-10-08 15:02:27 Youtube
2014-10-08 15:01:56 DailyMotion
2014-10-08 13:27:28 Facebook
2014-10-08 13:01:08 Vimeo
2014-10-08 12:52:06 Facebook
2014-10-08 12:43:27 Facebook
Name: summary, Length: 600
I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.
我想每周和每年计算每个类别(时间序列中的唯一值/因素)。
Example:
Week/Year Category Count
1/2014 Facebook 12
1/2014 Google 5
1/2014 Youtube 2
...
2/2014 Facebook 2
2/2014 Google 5
2/2014 Youtube 20
...
How can this be achieved using Python pandas?
如何使用 Python pandas 实现这一点?
采纳答案by Alex Riley
It might be easiest to turn your Series into a DataFrame and use Pandas' groupby
functionality (if you already have a DataFrame then skip straight to adding another column below).
将您的系列转换为 DataFrame 并使用 Pandas 的groupby
功能可能是最简单的(如果您已经有了 DataFrame,则直接跳到下面添加另一列)。
If your Series is called s
, then turn it into a DataFrame like so:
如果您的 Series 被称为s
,则将其转换为 DataFrame ,如下所示:
>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
Category Timestamp
0 Facebook 2014-10-16 15:05:17
1 Vimeo 2014-10-16 14:56:37
2 Facebook 2014-10-16 14:25:16
...
Now add another column for the week and year (one way is to use apply
and generate a string of the week/year numbers):
现在为周和年添加另一列(一种方法是使用apply
并生成一串周/年数字):
>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
Timestamp Category Week/Year
0 2014-10-16 15:05:17 Facebook 42/2014
1 2014-10-16 14:56:37 Vimeo 42/2014
2 2014-10-16 14:25:16 Facebook 42/2014
...
Finally, group by 'Week/Year'
and 'Category'
and aggregate with size()
to get the counts. For the data in your question this produces the following:
最后,按'Week/Year'
和分组'Category'
和聚合size()
以得到计数。对于您问题中的数据,这会产生以下结果:
>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year Category
41/2014 DailyMotion 1
Facebook 3
Vimeo 2
Youtube 3
42/2014 Facebook 7
Orkut 1
Vimeo 1
回答by Leon
To be a little bit more clear, you do not need to create a new column called 'week_num' first.
更清楚一点,您不需要先创建一个名为“week_num”的新列。
df.groupby(by=lambda x: "%d/%d" % (x.week(), x.year())).Category.value_counts()
The function by will automatically call on each timestamp object of the index to convert them to week and year, and then group by the week and year.
by 函数会自动调用索引的每个时间戳对象,将它们转换为周和年,然后按周和年分组。
回答by cwharland
Convert your TimeStamp column to week number then groupby that week number and value_count
the categorical variable like so:
将您的 TimeStamp 列转换为周数,然后按该周数和value_count
分类变量进行分组,如下所示:
df.groupby('week_num').Category.value_counts()
Where I have assumed that a new column week_num
was created from the TimeStamp column.
我假设week_num
从 TimeStamp 列创建了一个新列。