pandas 我可以使用 seaborn 在 x 轴上绘制带有日期时间的线性回归吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29308729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:07:16  来源:igfitidea点击:

Can I plot a linear regression with datetimes on the x-axis with seaborn?

pythonpandasdataframematplotlibseaborn

提问by theQman

My DataFrame object looks like

我的 DataFrame 对象看起来像

            amount
date    
2014-01-06  1
2014-01-07  1
2014-01-08  4
2014-01-09  1
2014-01-14  1

I would like a sort of scatter plot with time along the x-axis, and amount on the y, with a line through the data to guide the viewer's eye. If I use the panadas plot df.plot(style="o")it's not quite right, because the line is not there. I would like something like the examples here.

我想要一种散点图,时间沿 x 轴,数量在 y 上,有一条穿过数据的线来引导观众的眼睛。如果我使用 panadas 情节df.plot(style="o")它不太正确,因为线不在那里。我想要类似这里的例子。

回答by waterproof

note: this has a lot in common with Ian Thompson's answer but the approach is different enough to have it be a separate answer. I use the DataFrame format provided in the question and avoid changing the index.

注意:这与 Ian Thompson 的答案有很多共同点,但该方法的不同之处足以让它成为一个单独的答案。我使用问题中提供的 DataFrame 格式并避免更改索引。

Seaborn and other libraries don't deal as well with datetime axes as you might like them to. Here's how I'd work around it:

Seaborn 和其他库并没有像您希望的那样处理日期时间轴。这是我解决它的方法:

Start by adding a column of date ordinals

首先添加一列日期序数

Seaborn will deal better with these than with dates. This is a handy trick for doing all kind of mathy things with dates and libraries that don't love dates.

Seaborn 会比处理日期更好地处理这些问题。这是一个方便的技巧,可以用不喜欢日期的日期和库来做各种数学运算。

df['date_ordinal'] = pd.to_datetime(df['date']).apply(lambda date: date.toordinal())

dataframe with ordinals

带序数的数据框

Make a plot with the ordinals on the date axis

用日期轴上的序数绘制一个图

ax = seaborn.regplot(
    data=df,
    x='date_ordinal',
    y='amount',
)
# Tighten up the axes for prettiness
ax.set_xlim(df['date_ordinal'].min() - 1, df['date_ordinal'].max() + 1)
ax.set_ylim(0, df['amount'].max() + 1)

Replace the ordinal X-axis labels with nice, readable dates

用漂亮、可读的日期替换顺序 X 轴标签

ax.set_xlabel('date')
new_labels = [date.fromordinal(int(item)) for item in ax.get_xticks()]
ax.set_xticklabels(new_labels)

plot with regression line

用回归线绘图

ta-daa!

哒哒!

回答by Ian Thompson

Since Seaborn has trouble with dates, I'm going to create a work-around. First, I'll make the Date column my index:

由于 Seaborn 在约会方面遇到麻烦,我将创建一个解决方法。首先,我将日期列作为我的索引:

# Make dataframe
df = pd.DataFrame({'amount' : [1,
                               1,
                               4,
                               1,
                               1]},
                  index = ['2014-01-06',
                           '2014-01-07',
                           '2014-01-08',
                           '2014-01-09',
                           '2014-01-14'])

Second, convert the index to pd.DatetimeIndex:

其次,将索引转换为 pd.DatetimeIndex:

# Make index pd.DatetimeIndex
df.index = pd.DatetimeIndex(df.index)

And replace the original with it:

并用它替换原来的:

# Make new index
idx = pd.date_range(df.index.min(), df.index.max())

Third, reindex with the new index (idx):

第三,使用新索引(idx)重新索引:

# Replace original index with idx
df = df.reindex(index = idx)

This will produce a new dataframe with NaN values for the dates you don't have data:

这将为您没有数据的日期生成一个具有 NaN 值的新数据框:

df edit

df编辑

Fourth, since Seaborn doesn't play nice with dates and regression lines I'll create a row count column that we can use as our x-axis:

第四,由于 Seaborn 不能很好地处理日期和回归线,我将创建一个行数列,我们可以将其用作我们的 x 轴:

# Insert row count
df.insert(df.shape[1],
          'row_count',
          df.index.value_counts().sort_index().cumsum())

Fifth, we should now be able to plot a regression line using 'row_count' as our x variable and 'amount' as our y variable:

第五,我们现在应该能够使用 'row_count' 作为我们的 x 变量和 'amount' 作为我们的 y 变量来绘制回归线:

# Plot regression using Seaborn
fig = sns.regplot(data = df, x = 'row_count', y = 'amount')

Sixth, if you would like the dates to be along the x-axis instead of the row_count you can set the x-tick labels to the index:

第六,如果您希望日期沿着 x 轴而不是 row_count,您可以将 x-tick 标签设置为索引:

# Change x-ticks to dates
labels = [item.get_text() for item in fig.get_xticklabels()]

# Set labels for 1:10 because labels has 11 elements (0 is the left edge, 11 is the right
# edge) but our data only has 9 elements
labels[1:10] = df.index.date

# Set x-tick labels
fig.set_xticklabels(labels)

# Rotate the labels so you can read them
plt.xticks(rotation = 45)

# Change x-axis title
plt.xlabel('date')

plt.show();

plot edit 2

情节编辑2

Hope this helps!

希望这可以帮助!