Python Pandas 可以绘制日期的直方图吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27365467/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can Pandas plot a histogram of dates?
提问by lollercoaster
I've taken my Series and coerced it to a datetime column of dtype=datetime64[ns]
(though only need day resolution...not sure how to change).
我已经使用了我的系列并将其强制为 dtype= 的日期时间列datetime64[ns]
(尽管只需要日期分辨率......不知道如何更改)。
import pandas as pd
df = pd.read_csv('somefile.csv')
column = df['date']
column = pd.to_datetime(column, coerce=True)
but plotting doesn't work:
但绘图不起作用:
ipdb> column.plot(kind='hist')
*** TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('float64')
I'd like to plot a histogram that just shows the count of dates by week, month, or year.
我想绘制一个直方图,它只按周、月或年显示日期计数。
Surely there is a way to do this in pandas
?
当然有办法做到这一点pandas
吗?
采纳答案by jrjc
Given this df:
鉴于此 df:
date
0 2001-08-10
1 2002-08-31
2 2003-08-29
3 2006-06-21
4 2002-03-27
5 2003-07-14
6 2004-06-15
7 2003-08-14
8 2003-07-29
and, if it's not already the case:
并且,如果还不是这样:
df["date"] = df["date"].astype("datetime64")
To show the count of dates by month:
要按月显示日期计数:
df.groupby(df["date"].dt.month).count().plot(kind="bar")
.dt
allows you to access the datetime properties.
.dt
允许您访问日期时间属性。
Which will give you:
这会给你:
You can replace month by year, day, etc..
您可以按年、日等替换月。
If you want to distinguish year and month for instance, just do:
例如,如果您想区分年份和月份,只需执行以下操作:
df.groupby([df["date"].dt.year, df["date"].dt.month]).count().plot(kind="bar")
Which gives:
这使:
Was it what you wanted ? Is this clear ?
是你想要的吗?这清楚吗?
Hope this helps !
希望这可以帮助 !
回答by lollercoaster
I think for solving that problem, you can use this code, it converts date type to int types:
我认为要解决这个问题,您可以使用此代码,它将日期类型转换为 int 类型:
df['date'] = df['date'].astype(int)
df['date'] = pd.to_datetime(df['date'], unit='s')
for getting date only, you can add this code:
仅获取日期,您可以添加以下代码:
pd.DatetimeIndex(df.date).normalize()
df['date'] = pd.DatetimeIndex(df.date).normalize()
回答by EngineeredE
I was just having trouble with this as well. I imagine that since you're working with dates you want to preserve chronological ordering (like I did.)
我也遇到了这个问题。我想,由于您正在处理日期,因此您希望保留时间顺序(就像我所做的那样。)
The workaround then is
解决方法是
import matplotlib.pyplot as plt
counts = df['date'].value_counts(sort=False)
plt.bar(counts.index,counts)
plt.show()
Please, if anyone knows of a better way please speak up.
请,如果有人知道更好的方法,请说出来。
EDIT: for jean above, here's a sample of the data [I randomly sampled from the full dataset, hence the trivial histogram data.]
编辑:对于上面的 jean,这是一个数据样本[我从完整数据集中随机采样,因此是微不足道的直方图数据。]
print dates
type(dates),type(dates[0])
dates.hist()
plt.show()
Output:
输出:
0 2001-07-10
1 2002-05-31
2 2003-08-29
3 2006-06-21
4 2002-03-27
5 2003-07-14
6 2004-06-15
7 2002-01-17
Name: Date, dtype: object
<class 'pandas.core.series.Series'> <type 'datetime.date'>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-f39e334eece0> in <module>()
2 print dates
3 print type(dates),type(dates[0])
----> 4 dates.hist()
5 plt.show()
/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
2570 values = self.dropna().values
2571
-> 2572 ax.hist(values, bins=bins, **kwds)
2573 ax.grid(grid)
2574 axes = np.array([ax])
/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5620 for xi in x:
5621 if len(xi) > 0:
-> 5622 xmin = min(xmin, xi.min())
5623 xmax = max(xmax, xi.max())
5624 bin_range = (xmin, xmax)
TypeError: can't compare datetime.date to float
回答by Ethan
I think resample might be what you are looking for. In your case, do:
我认为 resample 可能是您正在寻找的。在您的情况下,请执行以下操作:
df.set_index('date', inplace=True)
# for '1M' for 1 month; '1W' for 1 week; check documentation on offset alias
df.resample('1M', how='count')
It is only doing the counting and not the plot, so you then have to make your own plots.
它只进行计数而不是绘图,因此您必须制作自己的绘图。
See this post for more details on the documentation of resample pandas resample documentation
有关 resample pandas resample 文档的更多详细信息,请参阅此帖子
I have ran into similar problems as you did. Hope this helps.
我遇到了和你一样的问题。希望这可以帮助。
回答by abeboparebop
I was able to work around this by (1) plotting with matplotlib instead of using the dataframe directly and (2) using the values
attribute. See example:
我能够通过(1)使用 matplotlib 绘图而不是直接使用数据框和(2)使用values
属性来解决这个问题。见示例:
import matplotlib.pyplot as plt
ax = plt.gca()
ax.hist(column.values)
This doesn't work if I don't use values
, but I don't know why it does work.
如果我不使用values
,这将不起作用,但我不知道它为什么起作用。
回答by Martin Thoma
Rendered example
渲染示例
Example Code
示例代码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Create random datetime object."""
# core modules
from datetime import datetime
import random
# 3rd party modules
import pandas as pd
import matplotlib.pyplot as plt
def visualize(df, column_name='start_date', color='#494949', title=''):
"""
Visualize a dataframe with a date column.
Parameters
----------
df : Pandas dataframe
column_name : str
Column to visualize
color : str
title : str
"""
plt.figure(figsize=(20, 10))
ax = (df[column_name].groupby(df[column_name].dt.hour)
.count()).plot(kind="bar", color=color)
ax.set_facecolor('#eeeeee')
ax.set_xlabel("hour of the day")
ax.set_ylabel("count")
ax.set_title(title)
plt.show()
def create_random_datetime(from_date, to_date, rand_type='uniform'):
"""
Create random date within timeframe.
Parameters
----------
from_date : datetime object
to_date : datetime object
rand_type : {'uniform'}
Examples
--------
>>> random.seed(28041990)
>>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
>>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
"""
delta = to_date - from_date
if rand_type == 'uniform':
rand = random.random()
else:
raise NotImplementedError('Unknown random mode \'{}\''
.format(rand_type))
return from_date + rand * delta
def create_df(n=1000):
"""Create a Pandas dataframe with datetime objects."""
from_date = datetime(1990, 4, 28)
to_date = datetime(2000, 12, 31)
sales = [create_random_datetime(from_date, to_date) for _ in range(n)]
df = pd.DataFrame({'start_date': sales})
return df
if __name__ == '__main__':
import doctest
doctest.testmod()
df = create_df()
visualize(df)
回答by JulianWgs
Here is a solution for when you just want to have a histogram like you expect it. This doesn't use groupby, but converts datetime values to integers and changes labels on the plot. Some improvement could be done to move the tick labels to even locations. Also with approach a kernel density estimation plot (and any other plot) is also possible.
当您只想获得您期望的直方图时,这是一个解决方案。这不使用 groupby,而是将日期时间值转换为整数并更改绘图上的标签。可以进行一些改进以将刻度标签移动到偶数位置。此外,通过方法,内核密度估计图(和任何其他图)也是可能的。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"datetime": pd.to_datetime(np.random.randint(1582800000000000000, 1583500000000000000, 100, dtype=np.int64))})
fig, ax = plt.subplots()
df["datetime"].astype(np.int64).plot.hist(ax=ax)
labels = ax.get_xticks().tolist()
labels = pd.to_datetime(labels)
ax.set_xticklabels(labels, rotation=90)
plt.show()
回答by Briford Wylie
All of these answers seem overly complex, as least with 'modern' pandas it's two lines.
所有这些答案似乎都过于复杂,至少对于“现代”熊猫来说,它是两行。
df.set_index('date', inplace=True)
df.resample('M').size().plot.bar()