Python 绘制 Pandas DataFrame 的出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21331722/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:37:06  来源:igfitidea点击:

Plot number of occurrences from Pandas DataFrame

pythonmatplotlibpandas

提问by Timofey

I have a DataFrame with two columns. One of them is containing timestamps and another one - id of some action. Something like that:

我有一个包含两列的 DataFrame。其中一个包含时间戳,另一个包含某些操作的 ID。类似的东西:

2000-12-29 00:10:00     action1
2000-12-29 00:20:00     action2
2000-12-29 00:30:00     action2
2000-12-29 00:40:00     action1
2000-12-29 00:50:00     action1
...
2000-12-31 00:10:00     action1
2000-12-31 00:20:00     action2
2000-12-31 00:30:00     action2

I would like to know how many actions of certain type have been performed in a certain day. I.e. for every day, I need to count the number of occurrences of actionX and plot this data with date on X axis and number of occurrences of actionX on Y axes, for each date.

我想知道某天执行了多少特定类型的操作。即对于每一天,我需要计算 actionX 的出现次数,并用 X 轴上的日期和 Y 轴上的 actionX 为每个日期绘制此数据。

Of course I can count actions for each day naively just by iterating through my dataset. But what's the "right way" to do in with pandas/matplotlib?

当然,我可以通过迭代我的数据集来天真地计算每天的动作。但是使用 pandas/matplotlib 的“正确方法”是什么?

采纳答案by mkln

Starting from

从...开始

                mydate col_name
0  2000-12-29 00:10:00  action1
1  2000-12-29 00:20:00  action2
2  2000-12-29 00:30:00  action2
3  2000-12-29 00:40:00  action1
4  2000-12-29 00:50:00  action1
5  2000-12-31 00:10:00  action1
6  2000-12-31 00:20:00  action2
7  2000-12-31 00:30:00  action2

You can do

你可以做

df['mydate'] = pd.to_datetime(df['mydate'])
df = df.set_index('mydate')
df['day'] = df.index.date
counts = df.groupby(['day', 'col_name']).agg(len)

but perhaps there's an even more straightforward way. the above should work anyway.

但也许还有更直接的方法。无论如何,以上应该有效。

If you want to use counts as a DataFrame, I'd then transform it back

如果您想将计数用作 DataFrame,我会将其转换回

counts = pd.DataFrame(counts, columns=['count'])

回答by David Hagan

You can get the counts by using

您可以使用以下方法获取计数

df.groupby([df.index.date, 'action']).count()

or you can plot directly using this method

或者您可以使用此方法直接绘图

df.groupby([df.index.date, 'action']).count().plot(kind='bar')

You could also just store the results to countand then plot it separately. This is assuming that your index is already in datetimeindex format, otherwise follow the directions of @mkln above.

您也可以将结果存储到count,然后单独绘制。这是假设您的索引已经是 datetimeindex 格式,否则请按照上面@mkln 的指示进行操作。