pandas 根据斜率向 matplotlib 散点图添加一条线
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31583982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding a line to a matplotlib scatterplot based on a slope
提问by Dennis
I have a scatter plot built from a DataFrame - it shows a correlation of two variables - Length and Age
我有一个从 DataFrame 构建的散点图 - 它显示了两个变量的相关性 - 长度和年龄
import matplotlib.pyplot as plt
df = DataFrame (......)
plt.title ('Fish Length vs Age')
plt.xlabel('Length')
plt.ylabel('Age (days)')
plt.scatter(df['length'],df['age'])
Now i want to add a line with a given slope of 0.88to this scatter plot. How do i do this?
现在我想在这个散点图中添加一条给定斜率为0.88的线。我该怎么做呢?
P.S. All examples i managed to find use points and not slopes to draw the line
PS我设法找到使用点而不是斜率来画线的所有示例
UPDATE. I re-read the theory - and it turned out that the fact that the correlation coefficient should be plotted against the data points was made up by me :) Partially because of this image in my head 
更新。我重新阅读了理论 - 结果发现,应该根据数据点绘制相关系数的事实是我自己编造的 :) 部分是因为我脑子里的这张图片
However i still am confused by the line - plotting capabilities of matplotlib
但是我仍然对 matplotlib 的线条绘图功能感到困惑
回答by KirstieJane
Building on @JinxunLi's answer you just want to add in a line connecting two points.
以@JinxunLi 的回答为基础,您只想添加一条连接两点的线。
These two points have x and y coordinates so for the two points you'll have four numbers: x_0, y_0, x_1, y_1.
这两个点具有 x 和 y 坐标,因此对于这两个点,您将有四个数字:x_0, y_0, x_1, y_1。
Let's assume you want the x coordinates of those two points to span the x axis so you're going to set x_0and x_1manually:
让我们假设你想这两个点的x坐标跨越x轴所以你要设定x_0和x_1手动:
x_0 = 0
x_1 = 5000
Alternatively you can just take the minimum and maximum values from the axis:
或者,您可以只从轴中获取最小值和最大值:
x_min, x_max = ax.get_xlim()
x_0 = x_min
x_1 = x_max
You define the slope of a line as increase in y / increase in xwhich would be:
您将一条线的斜率定义increase in y / increase in x为:
slope = (y_1 - y_0) / (x_1 - x_0)
And this can rearrange to:
这可以重新排列为:
(y_1 - y_0) = slope * (x_1 - x_0)
There are an infinite number of parallel lines with this slope so we'll have to set one of the points to start off with. For this example let's assume you want the line to go through the origin (0,0)
这个斜率有无数条平行线,所以我们必须设置一个点来开始。对于此示例,假设您希望该线通过原点(0,0)
x_0 = 0 # We already know this as it was set earlier
y_0 = 0
Now you can rearrange the formula for y_1as:
现在您可以将公式重新排列y_1为:
y_1 = slope * (x_1 - x_0) + y_0
If you know you want the slope to be 0.88 then you can calculate the y position of the other point:
如果您知道您希望斜率为 0.88,那么您可以计算另一个点的 y 位置:
y_1 = 0.88 * (5000 - 0) + 0
For the data you've provided in the question a line with slope 0.88 will fly off the top of the y axis very quickly (y_1 = 4400in the example above).
对于您在问题中提供的数据,斜率为 0.88 的线将非常快速地飞离 y 轴的顶部(y_1 = 4400在上面的示例中)。
In the example below I've put in a line with slope = 0.03.
在下面的示例中,我已经放置了一条斜率 = 0.03 的线。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000
# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])
# Now add on a line with a fixed slope of 0.03
slope = 0.03
# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_0 = 0
# And we'll have the line stop at x = 5000
x_1 = 5000
y_1 = slope (x_1 - x_0) + y_0
# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')
# And now connect them
ax.plot([x_0, x_1], [y_0, y_1], c='r')
plt.show()
回答by Jianxun Li
The correlation coefficient won't give the slope of the regression line, because your data are in different scales. If you would like to plot scatter with regression line, I would recommend to do it in seabornwith a minimum lines of codes.
相关系数不会给出回归线的斜率,因为您的数据具有不同的尺度。如果您想用回归线绘制散点图,我建议您seaborn使用最少的代码行进行绘制。
To install seaborn,
要安装seaborn,
pip install seaborn
Code example:
代码示例:
import numpy as np
import pandas as pd
import seaborn as sns
# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])
df
# plot
# ====================================
sns.set_style('ticks')
sns.regplot(df.X, df.Y, ci=None)
sns.despine()
Edit:
编辑:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])
# plot
# ==============================
fig, ax = plt.subplots()
ax.scatter(df.X, df.Y)
# need a slope and c to fix the position of line
slope = 10
c = -100
x_min, x_max = ax.get_xlim()
y_min, y_max = c, c + slope*(x_max-x_min)
ax.plot([x_min, x_max], [y_min, y_max])
ax.set_xlim([x_min, x_max])

