Pandas 和 Matplotlib - fill_between() 与 datetime64

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29329725/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:07:38  来源:igfitidea点击:

Pandas and Matplotlib - fill_between() vs datetime64

pythonpandasmatplotlib

提问by chilliq

There is a Pandas DataFrame:

有一个 Pandas DataFrame:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 300 entries, 5220 to 5519
Data columns (total 3 columns):
Date             300 non-null datetime64[ns]
A                300 non-null float64
B                300 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 30.5 KB

I want to plot A and B series vs Date.

我想绘制 A 和 B 系列与日期。

plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

Then I want apply fill_between() on area between A and B series:

然后我想在 A 和 B 系列之间的区域上应用 fill_between():

plt.fill_between(data['Date'], data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)

Which outputs:

哪些输出:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the casting 
rule ''safe''

Does matplotlib accept pandas datetime64 object in fill_between()function? Should I convert it to different date type?

matplotlib 是否在fill_between()函数中接受 pandas datetime64 对象?我应该将其转换为不同的日期类型吗?

采纳答案by unutbu

Pandas registers a converterin matplotlib.units.registrywhich converts a number of datetime types (such as pandas DatetimeIndex, and numpy arrays of dtype datetime64) to matplotlib datenums, but it does not handle Pandas Serieswith dtype datetime64.

Pandas 注册了一个转换器matplotlib.units.registry其中将许多日期时间类型(例如 pandas DatetimeIndex 和datetime64dtype 的numpy 数组)转换为 matplotlib datenums,但它不处理Series带有dtype 的 Pandas datetime64

In [67]: import pandas.tseries.converter as converter

In [68]: c = converter.DatetimeConverter()

In [69]: type(c.convert(df['Date'].values, None, None))
Out[69]: numpy.ndarray              # converted (good)

In [70]: type(c.convert(df['Date'], None, None))
Out[70]: pandas.core.series.Series  # left unchanged

fill_betweenchecks for and uses a converter to handle the data if it exists.

fill_between检查并使用转换器来处理数据(如果存在)。

So as a workaround, you could convert the dates to a NumPy array of datetime64's:

因此,作为一种解决方法,您可以将日期转换为datetime64's的 NumPy 数组:

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)


For example,

例如,

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='D')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
               'Date': dates})
plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

enter image description here

在此处输入图片说明

回答by TurnipEntropy

As WillZ pointed out, Pandas 0.21 broke unutbu's workaround. Converting datetimes to dates, however, can have significantly negative impacts on data analysis. This solution currently works and keeps datetime:

正如 WillZ 所指出的,Pandas 0.21 打破了 unutbu 的解决方法。但是,将日期时间转换为日期会对数据分析产生显着的负面影响。此解决方案目前有效并保持日期时间:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
d = data['Date'].dt.to_pydatetime()
plt.plot_date(d, data['A'], '-')
plt.plot_date(d, data['B'], '-')


plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

fill_between with datetime64 constraint

fill_between 与 datetime64 约束

EDIT: As per jedi's comment, I set out to determine the fastest approach of the three options below:

编辑:根据绝地的评论,我开始确定以下三个选项中最快的方法:

  • method1 = original answer
  • method2 = jedi's comment + original answer
  • method3 = jedi's comment
  • 方法 1 = 原始答案
  • 方法 2 = 绝地评论 + 原始答案
  • 方法 3 = 绝地评论

method2 was slightly faster, but much more consistent, and thus I have edited the above answer to reflect the best approach.

method2 稍微快一点,但更加一致,因此我编辑了上面的答案以反映最佳方法。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time


N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
time_data = pd.DataFrame(columns=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'])
method1 = []
method2 = []
method3 = []
for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method1.append(time.clock() - start)

for i  in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method2.append(time.clock() - start)

for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['A'], '-')
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['B'], '-')


        plt.fill_between(data['Date'].dt.to_pydatetime(), data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method3.append(time.clock() - start)

time_data.loc['method1'] = method1
time_data.loc['method2'] = method2
time_data.loc['method3'] = method3
print(time_data)
plt.errorbar(time_data.index, time_data.mean(axis=1), yerr=time_data.std(axis=1))

time test of 3 methods on converting time data for plotting a DataFrame

转换时间数据以绘制 DataFrame 的 3 种方法的时间测试

回答by WillZ

I encountered this issue after upgrading to Pandas 0.21. My code ran fine previously with fill_between()but broke after the upgrade.

升级到 Pandas 0.21 后我遇到了这个问题。我的代码以前运行良好,fill_between()但在升级后坏了。

It turns out that this fix mentioned in @unutbu 's answer, which is what I had before anyway, only works if the DatetimeIndexcontains dateobjects rather than datetimeobjects that has time info.

事实证明,@unutbu 的回答中提到的这个修复程序,无论如何我以前都有过,只有在DatetimeIndex包含date对象而不是datetime具有时间信息的对象时才有效。

Looking at the example above, what I did to fix it was to add the following line before calling fill_between():

查看上面的示例,我所做的修复是在调用之前添加以下行fill_between()

d['Date'] = [z.date() for z in d['Date']]

回答by Keith Landry

I had a similar problem. I have a DataFrame that looks something like this:

我有一个类似的问题。我有一个看起来像这样的 DataFrame:

date        upper     lower 
2018-10-10  0.999614  0.146746
2018-10-26  0.999783  0.333178
2019-01-02  0.961252  0.176736
2019-01-08  0.977487  0.371374
2019-01-09  0.923230  0.286423
2019-01-10  0.880961  0.294823
2019-01-11  0.846933  0.303679
2019-01-14  0.846933  0.303679
2019-01-15  0.800336  0.269864
2019-01-16  0.706114  0.238787

with dtypes:

使用数据类型:

date     datetime64[ns]
upper           float64
lower           float64

The following results in the error from the initial post

以下导致初始帖子的错误

plt.fill_between(dplot.date, dplot.lower, dplot.upper, alpha=.2)

Interestingly,

有趣的是,

plt.fill_between(dplot.date.values, dplot.lower, dplot.upper, alpha=.2)

works perfectly fine.

工作得很好。