Python 如何向散点图添加一条最佳拟合线

Question

提问by JavascriptLoser

I'm currently working with Pandas and matplotlib to perform some data visualization and I want to add a line of best fit to my scatter plot.

我目前正在使用 Pandas 和 matplotlib 来执行一些数据可视化，我想为散点图添加一条最适合的线。

Here is my code:

这是我的代码：

import matplotlib
import matplotlib.pyplot as plt
import pandas as panda
import numpy as np

def PCA_scatter(filename):

   matplotlib.style.use('ggplot')

   data = panda.read_csv(filename)
   data_reduced = data[['2005', '2015']]

   data_reduced.plot(kind='scatter', x='2005', y='2015')
   plt.show()

PCA_scatter('file.csv')

How do I go about this?

我该怎么做？

Answer 1

回答by Robert Calhoun

You can do the whole fit and plot in one fell swoop with Seaborn.

您可以使用Seaborn一举完成整个拟合和情节。

import pandas as pd
import seaborn as sns
data_reduced= pd.read_csv('fake.txt',sep='\s+')
sns.regplot(data_reduced['2005'],data_reduced['2015'])

Answer 2

回答by Stefan

You can use np.polyfit()and np.poly1d(). Estimate a first degree polynomial using the same xvalues, and add to the axobject created by the .scatter()plot. Using an example:

您可以使用np.polyfit()和np.poly1d()。使用相同的x值估计一阶多项式，并将其添加到绘图ax创建的对象中.scatter()。使用示例：

import numpy as np

     2005   2015
0   18882  21979
1    1161   1044
2     482    558
3    2105   2471
4     427   1467
5    2688   2964
6    1806   1865
7     711    738
8     928   1096
9    1084   1309
10    854    901
11    827   1210
12   5034   6253

Estimate first-degree polynomial:

估计一次多项式：

z = np.polyfit(x=df.loc[:, 2005], y=df.loc[:, 2015], deg=1)
p = np.poly1d(z)
df['trendline'] = p(df.loc[:, 2005])

     2005   2015     trendline
0   18882  21979  21989.829486
1    1161   1044   1418.214712
2     482    558    629.990208
3    2105   2471   2514.067336
4     427   1467    566.142863
5    2688   2964   3190.849200
6    1806   1865   2166.969948
7     711    738    895.827339
8     928   1096   1147.734139
9    1084   1309   1328.828428
10    854    901   1061.830437
11    827   1210   1030.487195
12   5034   6253   5914.228708

and plot:

和情节：

ax = df.plot.scatter(x=2005, y=2015)
df.set_index(2005, inplace=True)
df.trendline.sort_index(ascending=False).plot(ax=ax)
plt.gca().invert_xaxis()

To get:

要得到：

Also provides the the line equation:

还提供了线方程：

'y={0:.2f} x + {1:.2f}'.format(z[0],z[1])

y=1.16 x + 70.46

Answer 3

回答by Alex Williams

Another option (using np.linalg.lstsq):

另一种选择（使用np.linalg.lstsq）：

# generate some fake data
N = 50
x = np.random.randn(N, 1)
y = x*2.2 + np.random.randn(N, 1)*0.4 - 1.8
plt.axhline(0, color='r', zorder=-1)
plt.axvline(0, color='r', zorder=-1)
plt.scatter(x, y)

# fit least-squares with an intercept
w = np.linalg.lstsq(np.hstack((x, np.ones((N,1)))), y)[0]
xx = np.linspace(*plt.gca().get_xlim()).T

# plot best-fit line
plt.plot(xx, w[0]*xx + w[1], '-k')

Answer 4

回答by user702846

This is covering the plotlyapproach

这涵盖了plotly方法

#load the libraries

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# create the data
N = 50
x = pd.Series(np.random.randn(N))
y = x*2.2 - 1.8

# plot the data as a scatter plot
fig = px.scatter(x=x, y=y) 

# fit a linear model 
m, c = fit_line(x = x, 
                y = y)

# add the linear fit on top
fig.add_trace(
    go.Scatter(
        x=x,
        y=m*x + c,
        mode="lines",
        line=go.scatter.Line(color="red"),
        showlegend=False)
)
# optionally you can show the slop and the intercept 
mid_point = x.mean()

fig.update_layout(
    showlegend=False,
    annotations=[
        go.layout.Annotation(
            x=mid_point,
            y=m*mid_point + c,
            xref="x",
            yref="y",
            text=str(round(m, 2))+'x+'+str(round(c, 2)) ,
        )
    ]
)
fig.show()

where fit_lineis

这里fit_line是

def fit_line(x, y):
    # given one dimensional x and y vectors - return x and y for fitting a line on top of the regression
    # inspired by the numpy manual - https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html 
    x = x.to_numpy() # convert into numpy arrays
    y = y.to_numpy() # convert into numpy arrays

    A = np.vstack([x, np.ones(len(x))]).T # sent the design matrix using the intercepts
    m, c = np.linalg.lstsq(A, y, rcond=None)[0]

    return m, c

Python 如何向散点图添加一条最佳拟合线

提问by JavascriptLoser

回答by Robert Calhoun

回答by Stefan

回答by Alex Williams

回答by user702846

相关推荐

最近更新

标签

Python 如何向散点图添加一条最佳拟合线

提问by JavascriptLoser

回答by Robert Calhoun

回答by Stefan

回答by Alex Williams

回答by user702846

相关推荐

Python Keras 准确率不会改变

Python WebDriverException：消息：服务 chromedriver 意外退出。状态代码是：127

Python 删除一列熊猫数据框中包含“假”的行

Python中的显示（）

相关推荐

最近更新

标签