如何在python中的散点图上过度绘制一条线？

Question

提问by goldisfine

I have two vectors of data and I've put them into matplotlib.scatter(). Now I'd like to over plot a linear fit to these data. How would I do this? I've tried using scikitlearnand np.scatter.

我有两个数据向量，并将它们放入matplotlib.scatter(). 现在我想过度绘制这些数据的线性拟合。我该怎么做？我试过使用scikitlearn和np.scatter。

Answer 1

采纳答案by Greg Whittier

import numpy as np
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt

# Sample data
x = np.arange(10)
y = 5 * x + 10

# Fit with polyfit
b, m = polyfit(x, y, 1)

plt.plot(x, y, '.')
plt.plot(x, b + m * x, '-')
plt.show()

enter image description here

在此处输入图片说明

Answer 2

回答by pcoving

I'm partial to scikits.statsmodels. Here an example:

我偏爱scikits.statsmodels。这里有一个例子：

import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt

X = np.random.rand(100)
Y = X + np.random.rand(100)*0.1

results = sm.OLS(Y,sm.add_constant(X)).fit()

print results.summary()

plt.scatter(X,Y)

X_plot = np.linspace(0,1,100)
plt.plot(X_plot, X_plot*results.params[0] + results.params[1])

plt.show()

The only tricky part is sm.add_constant(X)which adds a columns of ones to Xin order to get an intercept term.

唯一棘手的部分是sm.add_constant(X)添加一列 1X以获得拦截项。

     Summary of Regression Results
=======================================
| Dependent Variable:            ['y']|
| Model:                           OLS|
| Method:                Least Squares|
| Date:               Sat, 28 Sep 2013|
| Time:                       09:22:59|
| # obs:                         100.0|
| Df residuals:                   98.0|
| Df model:                        1.0|
==============================================================================
|                   coefficient     std. error    t-statistic          prob. |
------------------------------------------------------------------------------
| x1                      1.007       0.008466       118.9032         0.0000 |
| const                 0.05165       0.005138        10.0515         0.0000 |
==============================================================================
|                          Models stats                      Residual stats  |
------------------------------------------------------------------------------
| R-squared:                     0.9931   Durbin-Watson:              1.484  |
| Adjusted R-squared:            0.9930   Omnibus:                    12.16  |
| F-statistic:                1.414e+04   Prob(Omnibus):           0.002294  |
| Prob (F-statistic):        9.137e-108   JB:                        0.6818  |
| Log likelihood:                 223.8   Prob(JB):                  0.7111  |
| AIC criterion:                 -443.7   Skew:                     -0.2064  |
| BIC criterion:                 -438.5   Kurtosis:                   2.048  |
------------------------------------------------------------------------------

example plot

示例图

Answer 3

回答by Franck Dernoncourt

Another way to do it, using axes.get_xlim():

另一种方法，使用axes.get_xlim()：

import matplotlib.pyplot as plt
import numpy as np

def scatter_plot_with_correlation_line(x, y, graph_filepath):
    '''
    http://stackoverflow.com/a/34571821/395857
    x does not have to be ordered.
    '''
    # Create scatter plot
    plt.scatter(x, y)

    # Add correlation line
    axes = plt.gca()
    m, b = np.polyfit(x, y, 1)
    X_plot = np.linspace(axes.get_xlim()[0],axes.get_xlim()[1],100)
    plt.plot(X_plot, m*X_plot + b, '-')

    # Save figure
    plt.savefig(graph_filepath, dpi=300, format='png', bbox_inches='tight')

def main():
    # Data
    x = np.random.rand(100)
    y = x + np.random.rand(100)*0.1

    # Plot
    scatter_plot_with_correlation_line(x, y, 'scatter_plot.png')

if __name__ == "__main__":
    main()
    #cProfile.run('main()') # if you want to do some profiling

Answer 4

回答by 1''

A one-line version of this excellent answerto plot the line of best fit is:

这个绘制最佳拟合线的优秀答案的单行版本是：

plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))

Using np.unique(x)instead of xhandles the case where xisn't sorted or has duplicate values.

使用np.unique(x)而不是x处理x未排序或具有重复值的情况。

The call to poly1dis an alternative to writing out m*x + blike in this other excellent answer.

调用 topoly1d是另一种写出的替代方法，m*x + b就像在另一个出色的答案中一样。

Answer 5

回答by Sébastien

plt.plot(X_plot, X_plot*results.params[0] + results.params[1])

versus

相对

plt.plot(X_plot, X_plot*results.params[1] + results.params[0])

Answer 6

回答by deepstructure

I like Seaborn's regplotor lmplotfor this:

我喜欢 Seaborn 的regplot或lmplot：

Answer 7

回答by Polina Novikova

You can use this tutorial by Adarsh Menon https://towardsdatascience.com/linear-regression-in-6-lines-of-python-5e1d0cd05b8d

您可以使用 Adarsh Menon 的本教程https://towardsdatascience.com/linear-regression-in-6-lines-of-python-5e1d0cd05b8d

This way is the easiest I found and it basically looks like:

这种方式是我发现的最简单的方式，它基本上看起来像：

import numpy as np
import matplotlib.pyplot as plt  # To visualize
import pandas as pd  # To read data
from sklearn.linear_model import LinearRegression
data = pd.read_csv('data.csv')  # load data set
X = data.iloc[:, 0].values.reshape(-1, 1)  # values converts it into a numpy array
Y = data.iloc[:, 1].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression()  # create object for the class
linear_regressor.fit(X, Y)  # perform linear regression
Y_pred = linear_regressor.predict(X)  # make predictions
plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')
plt.show()

如何在python中的散点图上过度绘制一条线？

提问by goldisfine

采纳答案by Greg Whittier

回答by pcoving

回答by Franck Dernoncourt

回答by 1''

回答by Sébastien

回答by deepstructure

回答by Polina Novikova

相关推荐

最近更新

标签

如何在python中的散点图上过度绘制一条线？

提问by goldisfine

采纳答案by Greg Whittier

回答by pcoving

回答by Franck Dernoncourt

回答by 1''

回答by Sébastien

回答by deepstructure

回答by Polina Novikova

相关推荐

Python setup.py 开发与安装

Python 向 Spark DataFrame 添加一个空列

Python 将 Snake Case 转换为 Lower Camel Case (lowerCamelCase)

Python 选择最后 n 列并排除数据框中的最后 n 列

相关推荐

最近更新

标签