pandas 我可以使用 seaborn 在 x 轴上绘制带有日期时间的线性回归吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29308729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can I plot a linear regression with datetimes on the x-axis with seaborn?
提问by theQman
My DataFrame object looks like
我的 DataFrame 对象看起来像
amount
date
2014-01-06 1
2014-01-07 1
2014-01-08 4
2014-01-09 1
2014-01-14 1
I would like a sort of scatter plot with time along the x-axis, and amount on the y, with a line through the data to guide the viewer's eye. If I use the panadas plot df.plot(style="o")it's not quite right, because the line is not there. I would like something like the examples here.
我想要一种散点图,时间沿 x 轴,数量在 y 上,有一条穿过数据的线来引导观众的眼睛。如果我使用 panadas 情节df.plot(style="o")它不太正确,因为线不在那里。我想要类似这里的例子。
回答by waterproof
note: this has a lot in common with Ian Thompson's answer but the approach is different enough to have it be a separate answer. I use the DataFrame format provided in the question and avoid changing the index.
注意:这与 Ian Thompson 的答案有很多共同点,但该方法的不同之处足以让它成为一个单独的答案。我使用问题中提供的 DataFrame 格式并避免更改索引。
Seaborn and other libraries don't deal as well with datetime axes as you might like them to. Here's how I'd work around it:
Seaborn 和其他库并没有像您希望的那样处理日期时间轴。这是我解决它的方法:
Start by adding a column of date ordinals
首先添加一列日期序数
Seaborn will deal better with these than with dates. This is a handy trick for doing all kind of mathy things with dates and libraries that don't love dates.
Seaborn 会比处理日期更好地处理这些问题。这是一个方便的技巧,可以用不喜欢日期的日期和库来做各种数学运算。
df['date_ordinal'] = pd.to_datetime(df['date']).apply(lambda date: date.toordinal())
Make a plot with the ordinals on the date axis
用日期轴上的序数绘制一个图
ax = seaborn.regplot(
data=df,
x='date_ordinal',
y='amount',
)
# Tighten up the axes for prettiness
ax.set_xlim(df['date_ordinal'].min() - 1, df['date_ordinal'].max() + 1)
ax.set_ylim(0, df['amount'].max() + 1)
Replace the ordinal X-axis labels with nice, readable dates
用漂亮、可读的日期替换顺序 X 轴标签
ax.set_xlabel('date')
new_labels = [date.fromordinal(int(item)) for item in ax.get_xticks()]
ax.set_xticklabels(new_labels)
ta-daa!
哒哒!
回答by Ian Thompson
Since Seaborn has trouble with dates, I'm going to create a work-around. First, I'll make the Date column my index:
由于 Seaborn 在约会方面遇到麻烦,我将创建一个解决方法。首先,我将日期列作为我的索引:
# Make dataframe
df = pd.DataFrame({'amount' : [1,
1,
4,
1,
1]},
index = ['2014-01-06',
'2014-01-07',
'2014-01-08',
'2014-01-09',
'2014-01-14'])
Second, convert the index to pd.DatetimeIndex:
其次,将索引转换为 pd.DatetimeIndex:
# Make index pd.DatetimeIndex
df.index = pd.DatetimeIndex(df.index)
And replace the original with it:
并用它替换原来的:
# Make new index
idx = pd.date_range(df.index.min(), df.index.max())
Third, reindex with the new index (idx):
第三,使用新索引(idx)重新索引:
# Replace original index with idx
df = df.reindex(index = idx)
This will produce a new dataframe with NaN values for the dates you don't have data:
这将为您没有数据的日期生成一个具有 NaN 值的新数据框:
Fourth, since Seaborn doesn't play nice with dates and regression lines I'll create a row count column that we can use as our x-axis:
第四,由于 Seaborn 不能很好地处理日期和回归线,我将创建一个行数列,我们可以将其用作我们的 x 轴:
# Insert row count
df.insert(df.shape[1],
'row_count',
df.index.value_counts().sort_index().cumsum())
Fifth, we should now be able to plot a regression line using 'row_count' as our x variable and 'amount' as our y variable:
第五,我们现在应该能够使用 'row_count' 作为我们的 x 变量和 'amount' 作为我们的 y 变量来绘制回归线:
# Plot regression using Seaborn
fig = sns.regplot(data = df, x = 'row_count', y = 'amount')
Sixth, if you would like the dates to be along the x-axis instead of the row_count you can set the x-tick labels to the index:
第六,如果您希望日期沿着 x 轴而不是 row_count,您可以将 x-tick 标签设置为索引:
# Change x-ticks to dates
labels = [item.get_text() for item in fig.get_xticklabels()]
# Set labels for 1:10 because labels has 11 elements (0 is the left edge, 11 is the right
# edge) but our data only has 9 elements
labels[1:10] = df.index.date
# Set x-tick labels
fig.set_xticklabels(labels)
# Rotate the labels so you can read them
plt.xticks(rotation = 45)
# Change x-axis title
plt.xlabel('date')
plt.show();
Hope this helps!
希望这可以帮助!


