pandas 0.21.0 时间戳与 matplotlib 的兼容性问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47404653/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas 0.21.0 Timestamp compatibility issue with matplotlib
提问by Kevin S.
I just updated pandas from 0.17.1 to 0.21.0 to take advantage of some new functionalities, and ran into compatibility issue with matplotlib (which I also updated to latest 2.1.0). In particular, the Timestamp object seems to be changed significantly.
我刚刚将 pandas 从 0.17.1 更新到 0.21.0 以利用一些新功能,并遇到了与 matplotlib(我也更新到最新的 2.1.0)的兼容性问题。尤其是 Timestamp 对象似乎发生了重大变化。
I happen to have another machine still running the older versions of pandas(0.17.1)/matplotlib(1.5.1) which I used to compared the differences:
我碰巧有另一台机器仍在运行旧版本的 Pandas(0.17.1)/matplotlib(1.5.1),我用来比较差异:
Both versions show my DataFrame index to be dtype='datetime64[ns]
两个版本都显示我的 DataFrame 索引为 dtype='datetime64[ns]
DatetimeIndex(['2017-03-13', '2017-03-14', ... '2017-11-17'], type='datetime64[ns]', name='dates', length=170, freq=None)
But when calling type(df.index[0])
, 0.17.1 gives pandas.tslib.Timestamp
and 0.21.0 gives pandas._libs.tslib.Timestamp
.
但是在调用时type(df.index[0])
, 0.17.1 给出pandas.tslib.Timestamp
0.21.0 给出pandas._libs.tslib.Timestamp
。
When plotting with df.index
as x-axis:
使用df.index
x 轴绘图时:
plt.plot(df.index, df['data'])
matplotlibs by default formats the x-axis labels as dates for pandas 0.17.1 but fails to recognize it for pandas 0.21.0 and simply gives raw number 1.5e18
(epoch time in nanosec).
默认情况下,matplotlibs 将 x 轴标签格式化为Pandas 0.17.1 的日期,但无法识别Pandas 0.21.0 并仅给出原始数字1.5e18
(以纳秒为单位的纪元时间)。
I also have a customized cursor that reports clicked location on the graph by using matplotlib.dates.DateFormatter
on the x-value which fails for 0.21.0 with:
我还有一个自定义的光标,它通过使用matplotlib.dates.DateFormatter
x 值来报告图表上的点击位置,该值在 0.21.0 时失败:
OverflowError: signed integer is greater than maximum
I can see in debug the reported x-value is around 736500 (i.e. day count since year 0) for 0.17.1 but is around 1.5e18 (i.e. nanosec epoch time) for 0.21.0.
我可以在调试中看到 0.17.1 报告的 x 值约为 736500(即自 0 年以来的天数),但 0.21.0 约为 1.5e18(即纳秒纪元时间)。
I am surprised at this break of compatibility between matplotlib and pandas as they are obviously used together by most people. Am I missing something in the way I call the plot function above for the newer versions?
我对 matplotlib 和 pandas 之间的这种兼容性中断感到惊讶,因为它们显然被大多数人一起使用。我是否在为较新版本调用上面的绘图函数的方式中遗漏了什么?
Updateas I mentioned above, I prefer directly calling plot
with a given axes object but just for the heck of it, I tried calling the plot method of the DataFrame itself df.plot()
. As soon as this is done, all subsequent plots correctly recognize the Timestamp within the same python session. It's as if an environment variable is set, because I can reload another DataFrame or create another axes with subplots
and no where does the 1.5e18
show up. This really smells like a bug as the latest pandas doc says pandas:
更新我上面提到的,我更喜欢直接调用plot
给定的轴对象,但只是为了它,我尝试调用 DataFrame 本身的 plot 方法df.plot()
。完成此操作后,所有后续绘图都会正确识别同一 python session 中的 Timestamp 。这就像设置了一个环境变量,因为我可以重新加载另一个 DataFrame 或创建另一个轴,subplots
而没有1.5e18
显示在哪里。这真的闻起来像一个错误,因为最新的大Pandas文档说大Pandas:
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
But clearly it does something to the python session such that subsequent plots deal with the Timestamp index properly.
但显然它对 python 会话做了一些事情,以便随后的绘图正确处理时间戳索引。
In fact, simply running the example at the above pandas link:
事实上,只需在上面的 Pandas 链接中运行示例:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
Depending on whether ts.plot()
is called or not, the following plot either correctly formats x-axis as dates or not:
根据是否ts.plot()
被调用,下图要么正确地将 x 轴格式化为日期:
plt.plot(ts.index,ts)
plt.show()
Once a member plot is called, subsequently calling plt.plot
on new Series or DataFrame will autoformat correctly without needing to call the member plot method again.
一旦调用了成员图,随后调用plt.plot
新的 Series 或 DataFrame 将正确自动格式化,而无需再次调用成员图方法。
回答by ImportanceOfBeingErnest
There is an issue with pandas datetimes and matplotlibcoming from the recent release of pandas 0.21, which does not register its converters any more at import. Once you use those converters once (within pandas) they'll be registered and automatically used by matplotlib as well.
来自最近发布的 pandas 0.21 的pandas datetimes 和 matplotlib存在一个问题,它在导入时不再注册其转换器。一旦您使用这些转换器一次(在Pandas中),它们将被注册并由 matplotlib 自动使用。
A workaround would be to register them manually,
一种解决方法是手动注册它们,
import pandas.plotting._converter as pandacnv
pandacnv.register()
In any case the issue is well known at both pandas and matplotlib side, so there will be some kind of fix for the next releases. Pandas is thinking about readding the registerin an upcomming release. So this issue may be there only temporarily. An option is also to revert to pandas 0.20.x where this should not occur.
在任何情况下,这个问题在 Pandas 和 matplotlib 方面都是众所周知的,所以下一个版本会有一些修复。Pandas 正在考虑在即将发布的版本中读取寄存器。所以这个问题可能只是暂时的。一个选项也是在不应该发生的情况下恢复到 pandas 0.20.x。
Update:this is no longer an issue with current versions of matplotlib (2.2.2)/pandas(0.23.1), and likely many that have been released since roughly December 2017, when this was fixed.
更新:这不再是当前版本的 matplotlib (2.2.2)/pandas(0.23.1) 的问题,而且很可能自 2017 年 12 月左右发布以来已经发布了许多问题,当这个问题被修复时。
Update 2:As of pandas 0.24 or higher the recommended way to register the converters is
更新 2:从 pandas 0.24 或更高版本开始,推荐的转换器注册方式是
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
or if pandas
is already imported as pd
,
或者如果pandas
已经导入为pd
,
pd.plotting.register_matplotlib_converters()
回答by Kevin S.
After opening an issueon pandas github, I learned that this was indeed a known issuebetween pandas and matplotlib regarding auto-registration of unit converter. In fact it was listed on what's new pagewhich I had failed to see before, along with the proper way to register the converters:
在pandas github上打开一个问题后,我了解到这确实是pandas和matplotlib之间关于单位转换器自动注册的一个已知问题。事实上,它列在我以前没有看到的新页面上,以及注册转换器的正确方法:
from pandas.tseries import converter
converter.register()
This is also done the first time a member plot method is called on a Series or DataFrame which explains what I observed above.
这也是第一次在 Series 或 DataFrame 上调用成员 plot 方法时完成的,这解释了我上面观察到的内容。
It appears to have been done with the intention that matplotlib is supposed to implement some basic support for pandas datetime, but indeed a deprecation warning of some sort could be useful for such a break. However until matplotlib actually implements such support (or some sort of lazy registration mechanism), practically I'm always putting those two lines at the pandas import. So I'm not sure why pandas would want to disable the automatic registration on import before things are ready on the matplotlib side.
这似乎是为了 matplotlib 应该实现对 Pandas datetime 的一些基本支持,但实际上某种类型的弃用警告可能对这种中断很有用。然而,在 matplotlib 实际实现这种支持(或某种惰性注册机制)之前,实际上我总是将这两行放在 pandas 导入中。所以我不确定为什么在 matplotlib 方面准备好之前,pandas 想要禁用导入时的自动注册。