将 Pandas DataFrame 传递给 Scipy.optimize.curve_fit

Question

提问by Sman789

I'd like to know the best way to use Scipy to fit Pandas DataFrame columns. If I have a data table (Pandas DataFrame) with columns (A, B, C, Dand Z_real) where Z depends on A, B, C and D, I want to fit a function of each DataFrame row (Series) which makes a prediction for Z (Z_pred).

我想知道使用 Scipy 来拟合 Pandas DataFrame 列的最佳方法。如果我有一个包含列 ( A, B, C,D和Z_real)的数据表 (Pandas DataFrame)，其中 Z 取决于 A、B、C 和 D，我想拟合每个 DataFrame 行（系列）的函数，该函数对 Z 进行预测（Z_pred）。

The signature of each function to fit is

要拟合的每个函数的签名是

func(series, param_1, param_2...)

where series is the Pandas Series corresponding to each row of the DataFrame. I use the Pandas Series so that different functions can use different combinations of columns.

其中 series 是对应于 DataFrame 每一行的 Pandas 系列。我使用 Pandas 系列，以便不同的函数可以使用不同的列组合。

I've tried passing the DataFrame to scipy.optimize.curve_fitusing

我试过将 DataFrame 传递给scipy.optimize.curve_fit使用

curve_fit(func, table, table.loc[:, 'Z_real'])

but for some reason each func instance is passed the whole datatable as its first argument rather than the Series for each row. I've also tried converting the DataFrame to a list of Series objects, but this results in my function being passed a Numpy array (I think because Scipy performs a conversion from a list of Series to a Numpy array which doesn't preserve the Pandas Series object).

但出于某种原因，每个 func 实例都将整个数据表作为其第一个参数而不是每一行的 Series 传递。我也尝试将 DataFrame 转换为 Series 对象列表，但这导致我的函数被传递了一个 Numpy 数组（我认为是因为 Scipy 执行了从 Series 列表到 Numpy 数组的转换，它不保留 Pandas系列对象）。

Answer 1

回答by ali_m

Your call to curve_fitis incorrect. From the documentation:

您调用的curve_fit是不正确的。从文档：

xdata: An M-length sequence or an (k,M)-shaped array for functions with k predictors.
The independent variable where the data is measured.
ydata: M-length sequence
The dependent data — nominally f(xdata, ...)

xdata：具有 k 个预测变量的函数的 M 长度序列或 (k,M) 形数组。
测量数据的自变量。
ydata: M 长度序列
依赖数据——名义上是 f(xdata, ...)

In this case your independent variablesxdataare the columns A to D, i.e. table[['A', 'B', 'C', 'D']], and your dependent variableydatais table['Z_real'].

在这种情况下，您的自变量xdata是 A 到 D 列，即table[['A', 'B', 'C', 'D']]，而您的因变量ydata是table['Z_real']。

Also note that xdatashould be a (k, M)array, where kis the number of predictor variables (i.e. columns) and Mis the number of observations (i.e. rows). You should therefore transpose your input dataframe so that it is (4, M)rather than (M, 4), i.e. table[['A', 'B', 'C', 'D']].T.

另请注意，xdata应该是一个(k, M)数组，其中k是预测变量（即列）的数量，而M是观测值（即行）的数量。因此，您应该将输入数据帧转置为(4, M)而不是(M, 4)，即table[['A', 'B', 'C', 'D']].T。

The whole call to curve_fitmight look something like this:

整个调用curve_fit可能如下所示：

curve_fit(func, table[['A', 'B', 'C', 'D']].T, table['Z_real'])

Here's a complete example showing multiple linear regression:

这是一个显示多元线性回归的完整示例：

import numpy as np
import pandas as pd
from scipy.optimize import curve_fit

X = np.random.randn(100, 4)     # independent variables
m = np.random.randn(4)          # known coefficients
y = X.dot(m)                    # dependent variable

df = pd.DataFrame(np.hstack((X, y[:, None])),
                  columns=['A', 'B', 'C', 'D', 'Z_real'])

def func(X, *params):
    return np.hstack(params).dot(X)

popt, pcov = curve_fit(func, df[['A', 'B', 'C', 'D']].T, df['Z_real'],
                       p0=np.random.randn(4))

print(np.allclose(popt, m))
# True

将 Pandas DataFrame 传递给 Scipy.optimize.curve_fit

提问by Sman789

回答by ali_m

相关推荐

最近更新

标签

将 Pandas DataFrame 传递给 Scipy.optimize.curve_fit

提问by Sman789

回答by ali_m

相关推荐

如何在 Pandas DataFrame 中存储行和列索引的名称？

pandas 如何使用pandas.read_csv()将索引数据作为字符串读取？

Pandas DataFrame 步骤图：where="post"

pandas 根据列的最大值删除熊猫数据框行

相关推荐

最近更新

标签