将 Pandas DataFrame 传递给 Scipy.optimize.curve_fit
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35233664/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pass Pandas DataFrame to Scipy.optimize.curve_fit
提问by Sman789
I'd like to know the best way to use Scipy to fit Pandas DataFrame columns. If I have a data table (Pandas DataFrame) with columns (A
, B
, C
, D
and Z_real
) where Z depends on A, B, C and D, I want to fit a function of each DataFrame row (Series) which makes a prediction for Z (Z_pred
).
我想知道使用 Scipy 来拟合 Pandas DataFrame 列的最佳方法。如果我有一个包含列 ( A
, B
, C
,D
和Z_real
)的数据表 (Pandas DataFrame),其中 Z 取决于 A、B、C 和 D,我想拟合每个 DataFrame 行(系列)的函数,该函数对 Z 进行预测(Z_pred
)。
The signature of each function to fit is
要拟合的每个函数的签名是
func(series, param_1, param_2...)
where series is the Pandas Series corresponding to each row of the DataFrame. I use the Pandas Series so that different functions can use different combinations of columns.
其中 series 是对应于 DataFrame 每一行的 Pandas 系列。我使用 Pandas 系列,以便不同的函数可以使用不同的列组合。
I've tried passing the DataFrame to scipy.optimize.curve_fit
using
我试过将 DataFrame 传递给scipy.optimize.curve_fit
使用
curve_fit(func, table, table.loc[:, 'Z_real'])
but for some reason each func instance is passed the whole datatable as its first argument rather than the Series for each row. I've also tried converting the DataFrame to a list of Series objects, but this results in my function being passed a Numpy array (I think because Scipy performs a conversion from a list of Series to a Numpy array which doesn't preserve the Pandas Series object).
但出于某种原因,每个 func 实例都将整个数据表作为其第一个参数而不是每一行的 Series 传递。我也尝试将 DataFrame 转换为 Series 对象列表,但这导致我的函数被传递了一个 Numpy 数组(我认为是因为 Scipy 执行了从 Series 列表到 Numpy 数组的转换,它不保留 Pandas系列对象)。
回答by ali_m
Your call to curve_fit
is incorrect. From the documentation:
您调用的curve_fit
是不正确的。从文档:
xdata: An M-length sequence or an (k,M)-shaped array for functions with k predictors.
The independent variable where the data is measured.
ydata: M-length sequence
The dependent data — nominally f(xdata, ...)
xdata:具有 k 个预测变量的函数的 M 长度序列或 (k,M) 形数组。
测量数据的自变量。
ydata: M 长度序列
依赖数据——名义上是 f(xdata, ...)
In this case your independent variablesxdata
are the columns A to D, i.e. table[['A', 'B', 'C', 'D']]
, and your dependent variableydata
is table['Z_real']
.
在这种情况下,您的自变量xdata
是 A 到 D 列,即table[['A', 'B', 'C', 'D']]
,而您的因变量ydata
是table['Z_real']
。
Also note that xdata
should be a (k, M)array, where kis the number of predictor variables (i.e. columns) and Mis the number of observations (i.e. rows). You should therefore transpose your input dataframe so that it is (4, M)rather than (M, 4), i.e. table[['A', 'B', 'C', 'D']].T
.
另请注意,xdata
应该是一个(k, M)数组,其中k是预测变量(即列)的数量,而M是观测值(即行)的数量。因此,您应该将输入数据帧转置为(4, M)而不是(M, 4),即table[['A', 'B', 'C', 'D']].T
。
The whole call to curve_fit
might look something like this:
整个调用curve_fit
可能如下所示:
curve_fit(func, table[['A', 'B', 'C', 'D']].T, table['Z_real'])
Here's a complete example showing multiple linear regression:
这是一个显示多元线性回归的完整示例:
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
X = np.random.randn(100, 4) # independent variables
m = np.random.randn(4) # known coefficients
y = X.dot(m) # dependent variable
df = pd.DataFrame(np.hstack((X, y[:, None])),
columns=['A', 'B', 'C', 'D', 'Z_real'])
def func(X, *params):
return np.hstack(params).dot(X)
popt, pcov = curve_fit(func, df[['A', 'B', 'C', 'D']].T, df['Z_real'],
p0=np.random.randn(4))
print(np.allclose(popt, m))
# True