Pandas 和 Matplotlib - fill_between() 与 datetime64

Question

提问by chilliq

There is a Pandas DataFrame:

有一个 Pandas DataFrame：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 300 entries, 5220 to 5519
Data columns (total 3 columns):
Date             300 non-null datetime64[ns]
A                300 non-null float64
B                300 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 30.5 KB

I want to plot A and B series vs Date.

我想绘制 A 和 B 系列与日期。

plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

Then I want apply fill_between() on area between A and B series:

然后我想在 A 和 B 系列之间的区域上应用 fill_between()：

plt.fill_between(data['Date'], data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)

Which outputs:

哪些输出：

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the casting 
rule ''safe''

Does matplotlib accept pandas datetime64 object in fill_between()function? Should I convert it to different date type?

matplotlib 是否在fill_between()函数中接受 pandas datetime64 对象？我应该将其转换为不同的日期类型吗？

Answer 1

采纳答案by unutbu

Pandas registers a converterin matplotlib.units.registrywhich converts a number of datetime types (such as pandas DatetimeIndex, and numpy arrays of dtype datetime64) to matplotlib datenums, but it does not handle Pandas Serieswith dtype datetime64.

Pandas 注册了一个转换器，matplotlib.units.registry其中将许多日期时间类型（例如 pandas DatetimeIndex 和datetime64dtype 的numpy 数组）转换为 matplotlib datenums，但它不处理Series带有dtype 的 Pandas datetime64。

In [67]: import pandas.tseries.converter as converter

In [68]: c = converter.DatetimeConverter()

In [69]: type(c.convert(df['Date'].values, None, None))
Out[69]: numpy.ndarray              # converted (good)

In [70]: type(c.convert(df['Date'], None, None))
Out[70]: pandas.core.series.Series  # left unchanged

fill_betweenchecks for and uses a converter to handle the data if it exists.

fill_between检查并使用转换器来处理数据（如果存在）。

So as a workaround, you could convert the dates to a NumPy array of datetime64's:

因此，作为一种解决方法，您可以将日期转换为datetime64's的 NumPy 数组：

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)

For example,

例如，

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='D')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
               'Date': dates})
plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

enter image description here

在此处输入图片说明

Answer 2

回答by TurnipEntropy

As WillZ pointed out, Pandas 0.21 broke unutbu's workaround. Converting datetimes to dates, however, can have significantly negative impacts on data analysis. This solution currently works and keeps datetime:

正如 WillZ 所指出的，Pandas 0.21 打破了 unutbu 的解决方法。但是，将日期时间转换为日期会对数据分析产生显着的负面影响。此解决方案目前有效并保持日期时间：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
d = data['Date'].dt.to_pydatetime()
plt.plot_date(d, data['A'], '-')
plt.plot_date(d, data['B'], '-')


plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

EDIT: As per jedi's comment, I set out to determine the fastest approach of the three options below:

编辑：根据绝地的评论，我开始确定以下三个选项中最快的方法：

method1 = original answer
method2 = jedi's comment + original answer
method3 = jedi's comment

方法 1 = 原始答案
方法 2 = 绝地评论 + 原始答案
方法 3 = 绝地评论

method2 was slightly faster, but much more consistent, and thus I have edited the above answer to reflect the best approach.

method2 稍微快一点，但更加一致，因此我编辑了上面的答案以反映最佳方法。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time


N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
time_data = pd.DataFrame(columns=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'])
method1 = []
method2 = []
method3 = []
for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method1.append(time.clock() - start)

for i  in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method2.append(time.clock() - start)

for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['A'], '-')
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['B'], '-')


        plt.fill_between(data['Date'].dt.to_pydatetime(), data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method3.append(time.clock() - start)

time_data.loc['method1'] = method1
time_data.loc['method2'] = method2
time_data.loc['method3'] = method3
print(time_data)
plt.errorbar(time_data.index, time_data.mean(axis=1), yerr=time_data.std(axis=1))

Answer 3

回答by WillZ

I encountered this issue after upgrading to Pandas 0.21. My code ran fine previously with fill_between()but broke after the upgrade.

升级到 Pandas 0.21 后我遇到了这个问题。我的代码以前运行良好，fill_between()但在升级后坏了。

It turns out that this fix mentioned in @unutbu 's answer, which is what I had before anyway, only works if the DatetimeIndexcontains dateobjects rather than datetimeobjects that has time info.

事实证明，@unutbu 的回答中提到的这个修复程序，无论如何我以前都有过，只有在DatetimeIndex包含date对象而不是datetime具有时间信息的对象时才有效。

Looking at the example above, what I did to fix it was to add the following line before calling fill_between():

查看上面的示例，我所做的修复是在调用之前添加以下行fill_between()：

d['Date'] = [z.date() for z in d['Date']]

Answer 4

回答by Keith Landry

I had a similar problem. I have a DataFrame that looks something like this:

我有一个类似的问题。我有一个看起来像这样的 DataFrame：

date        upper     lower 
2018-10-10  0.999614  0.146746
2018-10-26  0.999783  0.333178
2019-01-02  0.961252  0.176736
2019-01-08  0.977487  0.371374
2019-01-09  0.923230  0.286423
2019-01-10  0.880961  0.294823
2019-01-11  0.846933  0.303679
2019-01-14  0.846933  0.303679
2019-01-15  0.800336  0.269864
2019-01-16  0.706114  0.238787

with dtypes:

使用数据类型：

date     datetime64[ns]
upper           float64
lower           float64

The following results in the error from the initial post

以下导致初始帖子的错误

plt.fill_between(dplot.date, dplot.lower, dplot.upper, alpha=.2)

Interestingly,

有趣的是，

plt.fill_between(dplot.date.values, dplot.lower, dplot.upper, alpha=.2)

works perfectly fine.

工作得很好。

Pandas 和 Matplotlib - fill_between() 与 datetime64

提问by chilliq

采纳答案by unutbu

回答by TurnipEntropy

回答by WillZ

回答by Keith Landry

相关推荐

最近更新

标签

Pandas 和 Matplotlib - fill_between() 与 datetime64

提问by chilliq

采纳答案by unutbu

回答by TurnipEntropy

回答by WillZ

回答by Keith Landry

相关推荐

pandas 熊猫：使用 if-else 填充新列

pandas 使用 statsmodel 从 Python 中的 GLM 中提取系数

How to pandas.date_range() to generate a range with no timestamps posterior to end parameter?

pandas statsmodels 中的多元线性回归：ValueError

相关推荐

最近更新

标签