pandas 用于线性回归的熊猫数据框转换

Question

提问by Jin

I read the CSV file and get a dataframe (name: data) that has a few columns, the first a few are in format numeric long(type:pandas.core.series.Series) and the last column(label) is a binary response variable string 'P(ass)'/'F(ail)'

我读取了 CSV 文件并获得了一个包含几列的数据框（名称：数据），前几列采用数字 long 格式（类型：pandas.core.series.Series），最后一列（标签）是二进制格式响应变量字符串'P(ass)'/'F(ail)'

import statsmodels.api as sm
label = data.ix[:, -1]
label[label == 'P'] = 1
label[label == 'F'] = 0

fea = data.ix[:, 0: -1]
logit = sm.Logit(label, fea)
result = logit.fit()
print result.summary()

Pandas throws me this error message: "ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data)"Numpy,Pandas etc modules are imported already. I tried to convert fea columns to float but still does not go through. Could someone tell me how to correct?
Thanks

Pandas 向我抛出此错误消息：“ ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data)“Numpy,Pandas 等模块已经导入。我试图将 fea 列转换为浮动，但仍然没有通过。有人能告诉我如何纠正吗？
谢谢

update:

更新：

data.info()
 <class 'pandas.core.frame.DataFrame'>
Int64Index: 500 entries, 68135 to 3002
Data columns (total 8 columns):
TestQty         500 non-null int64
WaferSize       500 non-null int64
ChuckTemp       500 non-null int64
Notch           500 non-null int64
ORIGINALDIEX    500 non-null int64
ORIGINALDIEY    500 non-null int64
DUTNo           500 non-null int64
PassFail        500 non-null object
dtypes: int64(7), object(1)
memory usage: 35.2+ KB

data.sum()
TestQty            530
WaferSize         6000
ChuckTemp        41395
Notch           135000
ORIGINALDIEX     12810
ORIGINALDIEY      7885
DUTNo           271132
PassFail            20
dtype: float64

Answer 1

回答by Alexander

Shouldn't your features be this:

你的特征不应该是这样的：

fea = data.ix[:, 0:-1]

From you data, you see that PassFail sums to 20 before you convert 'P' to 1 and 'F' to zero. I believe that is the source of your error.

从您的数据中，您可以看到 PassFail 在将“P”转换为 1 并将“F”转换为 0 之前的总和为 20。我相信这是你错误的根源。

To see what is in there, try:

要查看里面有什么，请尝试：

data.PassFail.unique()

To verify that it totals to 500 (the number of rows in the DataFrame):

要验证它总计为 500（DataFrame 中的行数）：

sum(label[label == 0]) + sum(label[label == 1)

Finally, try passing values to the function rather than Series and DataFrames:

最后，尝试将值传递给函数而不是 Series 和 DataFrames：

logit = sm.Logit(label.values, fea.values)

pandas 用于线性回归的熊猫数据框转换

提问by Jin

回答by Alexander

相关推荐

最近更新

标签

pandas 用于线性回归的熊猫数据框转换

提问by Jin

回答by Alexander

相关推荐

Pandas 堆叠条形图和分组条形图

使用 Pandas 聚合所有数据帧行对组合

pandas 熊猫数据框有条件的 .mean() 取决于特定列中的值

将 Pandas 数据框列值合并到新列中

相关推荐

最近更新

标签