python pandas plot具有不均匀的时间序列索引(计数均匀分布)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22258162/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas plot with uneven timeseries index (with count evenly distributed)
提问by bbc
My dataframe has uneven time index.
我的数据框的时间索引不均匀。
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
我怎么能找到一种方法来绘制数据,并自动本地化索引?我在这里搜索,我知道我可以绘制类似的东西
e.plot()


but the time index (x axis) will be even interval, for example per 5 minutes. if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot with number of data evenly. and locate the right timestamp on x axis.
但时间索引(x 轴)将是偶数间隔,例如每 5 分钟。如果前 5 分钟必须有 100 个数据,后 5 分钟必须有 6 个数据,我如何均匀地绘制数据数量。并在 x 轴上找到正确的时间戳。
here's even count, but I don't know how to add time index.
这里甚至计数,但我不知道如何添加时间索引。
plot(e['Bid'].values)


example of data format as requested
请求的数据格式示例
Time,Bid
时间,出价
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
2014-03-05 21:56:09:072019,1.37272
and here's the link http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
这是链接 http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
这是代码,我曾经绘制过
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
更新:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.
我添加了两个新图进行比较以澄清问题。现在我尝试使用蛮力将时间戳索引转换回字符串,并将字符串绘制为 x 轴。格式很容易搞砸。似乎很难自定义 x 标签的位置。




采纳答案by 8one6
Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
好的,您似乎想要在 x 刻度位置周围移动,以便每个刻度之间有相同数量的点。并且您希望在这些适当定位的刻度上绘制网格。我有这个权利吗?
If so:
如果是这样的话:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)


回答by 8one6
I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
我试图重现您的问题,但似乎无法重现。你能看看这个例子,看看你的情况有什么不同吗?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
    for j in range(8):
        ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
给出以下两个图:


There are a couple differences between the graphs.  The one on the left doesn't connect the non-continuous bits of data.  And it lacks vertical gridlines.  But both seem to respect the actual index of the data.  Can you show an example of your eseries?  What is the exact format of its index?  Is it a datetime_indexor is it just text?
图表之间存在一些差异。左边的不连接数据的非连续位。它缺乏垂直网格线。但两者似乎都尊重数据的实际索引。你能举一个你的e系列的例子吗?它的索引的确切格式是什么?它是一个datetime_index还是只是文本?
Edit:
编辑:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
玩这个,我猜你的索引实际上只是文本。如果我从上面继续:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
然后我得到了类似你的问题:


More edit:
更多编辑:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
如果这实际上是您的问题(索引是一堆字符串,而不是一堆时间戳),那么您可以转换它们,一切都会好起来的:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
产生与上面第一个代码示例相同的输出。
Editing again:
再次编辑:
To see the uneven spacing of the data, you can do this:
要查看数据的不均匀间距,您可以这样做:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)


回答by 8one6
Let me try this one from scratch. Does this solve your issue?
让我从头开始尝试这个。这能解决您的问题吗?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()


The thing is, you want to plot bidvs time. If you've put the times into your indexthen they become your x-axis for "free".  If the time data is just another column, then you need to specify that you want to plot bidas the y-axis variable and timeas the x-axis variable.  So in your code above, even when you convert the timedata to be datetimetype, you were never instructing pandas/matplotlibto use those datetimesas the x-axis. 
问题是,你想绘制bidvs time。如果您已将时间放入您的时间,index那么它们将成为您“免费”的 x 轴。如果时间数据只是另一列,则需要指定要绘制bid为 y 轴变量和timex 轴变量。因此,在上面的代码中,即使将time数据转换为datetime类型,也从未指示pandas/matplotlib将它们datetimes用作 x 轴。

