Python 使用 curve_fit 获取 r 平方值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19189362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:05:08  来源:igfitidea点击:

Getting the r-squared value using curve_fit

pythonmatplotlibscipy

提问by Mathias

I am a beginner with both Python and all its libs. But I have managed to make a small program that works as intended. It takes a string, counts the occurence of the different letters and plots them in a graph and then applies a equation and its curve.¨ Now i would like to get the r-squared value of the fit.

我是 Python 及其所有库的初学者。但是我设法制作了一个按预期工作的小程序。它需要一个字符串,计算不同字母的出现次数并将它们绘制在图形中,然后应用一个方程及其曲线。¨ 现在我想得到拟合的 r 平方值。

The overall idea is to compare different kinds of text from articles on different levels and see how strong the overall pattern is.

总体思路是比较不同层次文章中不同类型的文本,看看整体模式有多强。

Is just an excersise and I am new, so a easy to understand answer would be awesome.

只是一个练习,我是新手,所以一个容易理解的答案会很棒。

The code is:

代码是:

import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib.pylab import figure, show
from scipy.optimize import curve_fit

s="""det, og deres unders?gelse af hvor meget det bliver brugt viser, at der kun er seks plugins, som benyttes af mere end 5 % af Chrome-brugere.
Problemet med teknologien er, at den ivivuilv rduyd iytf ouyf ouy yg oyuf yd iyt erzypu zhrpyh dfgopaehr poargi ah pargoh ertao gehorg aeophgrpaoghraprbpaenbtibaeriber en af hoved?rsagerne til sikkerhedshuller, ustabilitet og deciderede nedbrud af browseren.
Der vil ikke bve lukket for API'et  ivivuilv rduyd iytf ouyf ouy yg oyuf yd iyt erzypu zhrpyh dfgopaehr poargi ah pargoh ertao gehorg aeophgrpaoghraprbpaenbtibaeriber en af hoved?rsagerne til sikkerhedshuller, ustabilitet og deciderede nedbrud af browseren.
Der vil ikke blive lukket for API'et p? én gang, men det vil blive udfaset i l?bet af et ?rs tid. De mest popul?re plugins f?r lov at fungere i udfasningsperioden; Det drejer sig om: Silverlight (anvendt af 15 % af Chrome-brugere sidste m?ned), Unity (9,1 %), Google Earth (9,1 %), Java (8,9%), Google Talk (8,7 %) og Facebook Video (6,0 %).
Det er muligt at hvidliste andre plugins, men i slutningen af 2014 forventer udviklerne helt at lukke for brugen af dem."""
fordel=[]
alf=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','?','?','?']
i=1
p=0
fig = figure()
ax1 = fig.add_subplot(1,2,0)
for i in range(len(alf)):
    fordel.append(s.count(alf[i]))
    i=i+1   
fordel=sorted(fordel,key=int,reverse=True)
yFit=fordel
xFit=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]
def func(x, a, b):
    return a * (b ** x)
popt, pcov = curve_fit(func, xFit, yFit)
t = np.arange(0.0, 30.0, 0.1)
a=popt[0]
b=popt[1]
s = (a*b**t)
ax1.plot(t,s)
print(popt)
yMax=math.ceil(fordel[0]+5)
ax1.axis([0,30,0,yMax])
for i in range(0,int(len(alf))*2,2):
    fordel.insert(i,p)
    p=p+1
for i in range(0,int(len(fordel)/2)):
    ax1.scatter(fordel[0],fordel[1])
    fordel.pop(0)
    fordel.pop(0)
plt.show()
show()

采纳答案by wingr

Computing r_squared:

计算r_squared

The r_squaredvalue can be found using the mean(mean), the total sum of squares(ss_tot), and the residual sum of squares(ss_res). Each is defined as:

r_squared可以使用均值( 意思)、总平方和( ss_tot) 和残差平方和( ss_res)找到该值。每个定义为:

mean

意思

SStot

斯托特

SSres

SSres

rsquared

平方

where f_iis the function value at point x_i. Taken from Wikipedia.

f_i点处的函数值在哪里x_i。摘自维基百科

From scipy.optimize.curve_fit():

来自scipy.optimize.curve_fit()

  • You can get the parameters (popt) from curve_fit()with

    popt, pcov = curve_fit(f, xdata, ydata)

  • You can get the residual sum of squares(ss_tot) with

    • residuals = ydata- f(xdata, popt)
    • ss_res = numpy.sum(residuals**2)
  • You can get the total sum of squares(ss_tot) with

    ss_tot = numpy.sum((ydata-numpy.mean(ydata))**2)

  • And finally, the r_squared-value with,

    r_squared = 1 - (ss_res / ss_tot)

  • 您可以poptcurve_fit()with获取参数 ( )

    popt, pcov = curve_fit(f, xdata, ydata)

  • 你可以得到残差平方和( ss_tot)

    • residuals = ydata- f(xdata, popt)
    • ss_res = numpy.sum(residuals**2)
  • 你可以得到总平方和( ss_tot) 与

    ss_tot = numpy.sum((ydata-numpy.mean(ydata))**2)

  • 最后,r_squared-value 与,

    r_squared = 1 - (ss_res / ss_tot)

回答by mutex86

I think this method is an easier way to solve the minimize problem:

我认为这种方法是解决最小化问题的一种更简单的方法:

res = minimize(func)  # your optimize function
cof = np.reshape(np.array(res.x),(-1,1))
r_square = 1.0 - (np.var(ydata-xdata.dot(cof)) / np.var(ydata))

# or 
# r_square = 1 - np.square(ydata-xdata.dot(cof)).sum() / (np.var(ydata) * len(ydata))