pandas 类型错误:不正确的输入:N=2 不得超过 M=1

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36295380/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:57:38  来源:igfitidea点击:

TypeError: Improper input: N=2 must not exceed M=1

python-3.xpandasscipycurve-fitting

提问by Ryan

I am writing a function to do non-linear curve fitting and am running into this error:

我正在编写一个函数来进行非线性曲线拟合,但遇到了这个错误:

TypeError: Improper input: N=2 must not exceed M=1. 

I don't know why it thinks I am trying to use too large of an array when I am only reading in columns from a csv file.

我不知道为什么当我只从 csv 文件中读取列时,它认为我试图使用太大的数组。

import math

#stolen sig-fig function <--trust but verify
def round_figures(x, n): 
    return round(x, int(n - math.ceil(math.log10(abs(x))))) 

def try_michaelis_menten_fit( df, pretty=False ):

    # auto-guess
    p0 = ( df['productFinal'].max(), df['substrateConcentration'].mean() )

    popt, pcov = curve_fit( v, df['substrateConcentration'], df['productFinal'], p0=p0 )
    perr = sqrt( diag( pcov ) )

    kcat_km = popt[0] / popt[1]
    # error propegation
    kcat_km_err = (sqrt( (( (perr[0])  / popt[0])**2) + ((  (perr[1])  / popt[1])**2) ))

    kcat = ( popt[0] )
    kcat_std_err = ( perr[0] )

    km_uM = ( popt[1] * 1000000 )
    km_std_err = ( perr[1] *1000000)


    if pretty:



        results = { 

        'kcat': round_figures(kcat, 3),
        'kcat_std_err': round_figures(kcat_std_err, 3),

        'km_uM': round_figures(km_uM, 5),
        'km_std_err': round_figures(km_std_err, 3),

        'kcat/km': round_figures(kcat_km, 2),
        'kcat/km_err': round_figures(kcat_km_err, 2),

        }

        return pandas.Series( results )
    else: 
        return popt, perr 

df = pandas.read_csv( 'PNP_Raw2Fittr.csv' ) 



fits = df.groupby('sample').apply( try_michaelis_menten_fit, pretty=True ) 
fits.to_csv( 'fits_pretty_output.csv' )
print( fits ) 

I am reading in a data frame that is an expanded version of something like this:

我正在阅读一个数据框,它是这样的扩展版本:

   sample   yield    dilution  time  productAbsorbance  substrateConcentration  internalStandard  
0  PNPH_I_4  2.604     10000  2400              269.6                0.007000   2364.0
1  PNPH_I_4  2.604     10000  2400              215.3                0.002333   2515.7
2  PNPH_I_4  2.604     10000  2400              160.3                0.000778   2252.2
3  PNPH_I_4  2.604     10000  2400              104.1                0.000259   2302.4
4  PNPH_I_4  2.604     10000  2400               60.9                0.000086   2323.5
5  PNPH_I_4  2.604     10000  2400               35.4                0.000029   2367.9
6  PNPH_I_4  2.604     10000  2400                0.0                0.000000   2165.3

When I call this function on this smaller version of my data frame it seems to work, but when I use it on the large one I get this error. This error began when I added the internalStandardcolumn and worked perfectly before that. To make matters even more confusing, when I revert back to old code with an old version of the data frame it works fine, however if I add that line I get the error as would be expected, HOWEVER, when i delete the same line in my data frame and run the code again I STILL get the same error!

当我在这个较小版本的数据框上调用这个函数时,它似乎可以工作,但是当我在大版本上使用它时,我得到了这个错误。当我添加该internalStandard列并在此之前完美运行时,此错误就开始了。更令人困惑的是,当我使用旧版本的数据框恢复到旧代码时,它工作正常,但是,如果我添加该行,我会得到预期的错误,但是,当我删除同一行时我的数据框并再次运行代码我仍然得到同样的错误!

I have figured out that I pass in method='trf'instead of lmfor my optimization method I instead get the error OverflowError: cannot convert float infinity to integer, however I do use the df.dropna(inplace=True), is there a similar method that is specific for infinity?

我发现我传入method='trf'而不是lm我的优化方法,而是得到错误OverflowError: cannot convert float infinity to integer,但是我确实使用了df.dropna(inplace=True),是否有专门针对无穷大的类似方法?

回答by feedMe

I believe this error is referring to the fact that the length of your xand y(e.g. df['substrateConcentration']and df['productFinal']) input data is less than the number of fitting parameters that are given to curve_fit, as defined in your fitting function v. This is a consequence of the mathematics; attempting to perform curve fitting (optimization) with too few constraints.

我相信这个错误是指你的xy(例如df['substrateConcentration']df['productFinal'])输入数据的长度小于curve_fit你的拟合函数中定义的拟合参数的数量v。这是数学的结果;尝试在约束太少的情况下执行曲线拟合(优化)。

I reproduced the same error with scipy.optimize.curve_fitby providing a fit function that expects 4 fitting parameters with an array of shape (2,).

scipy.optimize.curve_fit通过提供一个拟合函数来重现相同的错误,该函数需要 4 个具有形状数组 (2,) 的拟合参数。

e.g.

例如

import numpy as np
from scipy.optimize import curve_fit

x, y = np.array([0.5, 4.0]), np.array([1.5, 0.6])

def func(x, a, b, c, d):
    return a*x**3. + b*x**2. - c/x + d

popt, pcov = curve_fit(func, x, y)

TypeError: Improper input: N=4 must not exceed M=2

类型错误:不正确的输入:N=4 不得超过 M=2

However, since you have not provided your fit function vin the question it is not possible to confirm that this is the specific cause of your problem.

但是,由于您没有v在问题中提供您的拟合函数,因此无法确认这是您问题的具体原因。

Maybe your input data is not being formatted exactly the way you think it is. I suggest that you check how your arrays look when they are being passed to curve_fit. You might be parsing the data wrongly so that the number of rows ends up being very small.

也许您的输入数据没有按照您认为的方式完全格式化。我建议您检查数组在传递给curve_fit. 您可能会错误地解析数据,因此行数最终非常小。

I have figured out that I pass in method='trf' instead of lm for my optimization method I instead get the error OverflowError: cannot convert float infinity to integer, however I do use the df.dropna(inplace=True), is there a similar method that is specific for infinity?

我发现我在优化方法中传入 method='trf' 而不是 lm 我反而得到错误 OverflowError: cannot convert float infinity to integer,但是我确实使用了 df.dropna(inplace=True),在那里特定于无穷大的类似方法?

Yes, so different methods for the optimization check the input data differently and throw different errors. This suggests, again, that there is some kind of problem with your input data. The first method is probably rejecting (ignoring) those rows that 'trf' is throwing this error for, and perhaps ending up with no rows at all.

是的,所以不同的优化方法会以不同的方式检查输入数据并抛出不同的错误。这再次表明您的输入数据存在某种问题。第一种方法可能是拒绝(忽略)'trf' 抛出此错误的那些行,并且可能最终根本没有行。