Python Pandas 可以绘制日期的直方图吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27365467/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:43:31  来源:igfitidea点击:

Can Pandas plot a histogram of dates?

pythonpandasmatplotlibtime-series

提问by lollercoaster

I've taken my Series and coerced it to a datetime column of dtype=datetime64[ns](though only need day resolution...not sure how to change).

我已经使用了我的系列并将其强制为 dtype= 的日期时间列datetime64[ns](尽管只需要日期分辨率......不知道如何更改)。

import pandas as pd
df = pd.read_csv('somefile.csv')
column = df['date']
column = pd.to_datetime(column, coerce=True)

but plotting doesn't work:

但绘图不起作用:

ipdb> column.plot(kind='hist')
*** TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('float64')

I'd like to plot a histogram that just shows the count of dates by week, month, or year.

我想绘制一个直方图,它只按周、月或年显示日期计数

Surely there is a way to do this in pandas?

当然有办法做到这一点pandas吗?

采纳答案by jrjc

Given this df:

鉴于此 df:

        date
0 2001-08-10
1 2002-08-31
2 2003-08-29
3 2006-06-21
4 2002-03-27
5 2003-07-14
6 2004-06-15
7 2003-08-14
8 2003-07-29

and, if it's not already the case:

并且,如果还不是这样:

df["date"] = df["date"].astype("datetime64")

To show the count of dates by month:

要按月显示日期计数:

df.groupby(df["date"].dt.month).count().plot(kind="bar")

.dtallows you to access the datetime properties.

.dt允许您访问日期时间属性。

Which will give you:

这会给你:

groupby date month

groupby 日期 月份

You can replace month by year, day, etc..

您可以按年、日等替换月。

If you want to distinguish year and month for instance, just do:

例如,如果您想区分年份和月份,只需执行以下操作:

df.groupby([df["date"].dt.year, df["date"].dt.month]).count().plot(kind="bar")

Which gives:

这使:

groupby date month year

groupby 日期 月 年

Was it what you wanted ? Is this clear ?

是你想要的吗?这清楚吗?

Hope this helps !

希望这可以帮助 !

回答by lollercoaster

I think for solving that problem, you can use this code, it converts date type to int types:

我认为要解决这个问题,您可以使用此代码,它将日期类型转换为 int 类型:

df['date'] = df['date'].astype(int)
df['date'] = pd.to_datetime(df['date'], unit='s')

for getting date only, you can add this code:

仅获取日期,您可以添加以下代码:

pd.DatetimeIndex(df.date).normalize()
df['date'] = pd.DatetimeIndex(df.date).normalize()

回答by EngineeredE

I was just having trouble with this as well. I imagine that since you're working with dates you want to preserve chronological ordering (like I did.)

我也遇到了这个问题。我想,由于您正在处理日期,因此您希望保留时间顺序(就像我所做的那样。)

The workaround then is

解决方法是

import matplotlib.pyplot as plt    
counts = df['date'].value_counts(sort=False)
plt.bar(counts.index,counts)
plt.show()

Please, if anyone knows of a better way please speak up.

请,如果有人知道更好的方法,请说出来。

EDIT: for jean above, here's a sample of the data [I randomly sampled from the full dataset, hence the trivial histogram data.]

编辑:对于上面的 jean,这是一个数据样本[我从完整数据集中随机采样,因此是微不足道的直方图数据。]

print dates
type(dates),type(dates[0])
dates.hist()
plt.show()

Output:

输出:

0    2001-07-10
1    2002-05-31
2    2003-08-29
3    2006-06-21
4    2002-03-27
5    2003-07-14
6    2004-06-15
7    2002-01-17
Name: Date, dtype: object
<class 'pandas.core.series.Series'> <type 'datetime.date'>

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-f39e334eece0> in <module>()
      2 print dates
      3 print type(dates),type(dates[0])
----> 4 dates.hist()
      5 plt.show()

/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
   2570         values = self.dropna().values
   2571 
-> 2572         ax.hist(values, bins=bins, **kwds)
   2573         ax.grid(grid)
   2574         axes = np.array([ax])

/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5620             for xi in x:
   5621                 if len(xi) > 0:
-> 5622                     xmin = min(xmin, xi.min())
   5623                     xmax = max(xmax, xi.max())
   5624             bin_range = (xmin, xmax)

TypeError: can't compare datetime.date to float

回答by Ethan

I think resample might be what you are looking for. In your case, do:

我认为 resample 可能是您正在寻找的。在您的情况下,请执行以下操作:

df.set_index('date', inplace=True)
# for '1M' for 1 month; '1W' for 1 week; check documentation on offset alias
df.resample('1M', how='count')

It is only doing the counting and not the plot, so you then have to make your own plots.

它只进行计数而不是绘图,因此您必须制作自己的绘图。

See this post for more details on the documentation of resample pandas resample documentation

有关 resample pandas resample 文档的更多详细信息,请参阅此帖子

I have ran into similar problems as you did. Hope this helps.

我遇到了和你一样的问题。希望这可以帮助。

回答by abeboparebop

I was able to work around this by (1) plotting with matplotlib instead of using the dataframe directly and (2) using the valuesattribute. See example:

我能够通过(1)使用 matplotlib 绘图而不是直接使用数据框和(2)使用values属性来解决这个问题。见示例:

import matplotlib.pyplot as plt

ax = plt.gca()
ax.hist(column.values)

This doesn't work if I don't use values, but I don't know why it does work.

如果我不使用values,这将不起作用,但我不知道它为什么起作用。

回答by Martin Thoma

Rendered example

渲染示例

enter image description here

在此处输入图片说明

Example Code

示例代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

# core modules
from datetime import datetime
import random

# 3rd party modules
import pandas as pd
import matplotlib.pyplot as plt


def visualize(df, column_name='start_date', color='#494949', title=''):
    """
    Visualize a dataframe with a date column.

    Parameters
    ----------
    df : Pandas dataframe
    column_name : str
        Column to visualize
    color : str
    title : str
    """
    plt.figure(figsize=(20, 10))
    ax = (df[column_name].groupby(df[column_name].dt.hour)
                         .count()).plot(kind="bar", color=color)
    ax.set_facecolor('#eeeeee')
    ax.set_xlabel("hour of the day")
    ax.set_ylabel("count")
    ax.set_title(title)
    plt.show()


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


def create_df(n=1000):
    """Create a Pandas dataframe with datetime objects."""
    from_date = datetime(1990, 4, 28)
    to_date = datetime(2000, 12, 31)
    sales = [create_random_datetime(from_date, to_date) for _ in range(n)]
    df = pd.DataFrame({'start_date': sales})
    return df


if __name__ == '__main__':
    import doctest
    doctest.testmod()
    df = create_df()
    visualize(df)

回答by JulianWgs

Here is a solution for when you just want to have a histogram like you expect it. This doesn't use groupby, but converts datetime values to integers and changes labels on the plot. Some improvement could be done to move the tick labels to even locations. Also with approach a kernel density estimation plot (and any other plot) is also possible.

当您只想获得您期望的直方图时,这是一个解决方案。这不使用 groupby,而是将日期时间值转换为整数并更改绘图上的标签。可以进行一些改进以将刻度标签移动到偶数位置。此外,通过方法,内核密度估计图(和任何其他图)也是可能的。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"datetime": pd.to_datetime(np.random.randint(1582800000000000000, 1583500000000000000, 100, dtype=np.int64))})
fig, ax = plt.subplots()
df["datetime"].astype(np.int64).plot.hist(ax=ax)
labels = ax.get_xticks().tolist()
labels = pd.to_datetime(labels)
ax.set_xticklabels(labels, rotation=90)
plt.show()

Datetime histogram

日期时间直方图

回答by Briford Wylie

All of these answers seem overly complex, as least with 'modern' pandas it's two lines.

所有这些答案似乎都过于复杂,至少对于“现代”熊猫来说,它是两行。

df.set_index('date', inplace=True)
df.resample('M').size().plot.bar()