Python 使用 Sklearn 对 Pandas DataFrame 进行线性回归（IndexError：元组索引超出范围）

Question

提问by Dinosaur

I'm new to Python and trying to perform linear regression using sklearn on a pandas dataframe. This is what I did:

我是 Python 新手，并尝试在 Pandas 数据帧上使用 sklearn 执行线性回归。这就是我所做的：

data = pd.read_csv('xxxx.csv')

After that I got a DataFrame of two columns, let's call them 'c1', 'c2'. Now I want to do linear regression on the set of (c1,c2) so I entered

之后，我得到了一个两列的 DataFrame，我们称它们为“c1”、“c2”。现在我想对 (c1,c2) 的集合进行线性回归，所以我输入了

X=data['c1'].values
Y=data['c2'].values
linear_model.LinearRegression().fit(X,Y)

which resulted in the following error

导致以下错误

IndexError: tuple index out of range

What's wrong here? Also, I'd like to know

这里有什么问题？另外我想知道

visualize the result
make predictions based on the result?

可视化结果
根据结果做出预测？

I've searched and browsed a large number of sites but none of them seemed to instruct beginners on the proper syntax. Perhaps what's obvious to experts is not so obvious to a novice like myself.

我搜索并浏览了大量网站，但似乎没有一个网站能指导初学者正确使用语法。也许对专家来说显而易见的东西对于像我这样的新手来说并不那么明显。

Can you please help? Thank you very much for your time.

你能帮忙吗？非常感谢您的宝贵时间。

PS: I have noticed that a large number of beginner questions were down-voted in stackoverflow. Kindly take into account the fact that things that seem obvious to an expert user may take a beginner days to figure out. Please use discretion when pressing the down arrow lest you'd harm the vibrancy of this discussion community.

PS：我注意到在 stackoverflow 中有大量初学者的问题被否决了。请考虑这样一个事实，对专家用户来说似乎很明显的事情可能需要初学者几天才能弄清楚。在按下向下箭头时请谨慎使用，以免损害此讨论社区的活力。

Answer 1

回答by Tommy

You really should have a look at the docs for the fitmethod which you can view here

你真的应该看看fit你可以在这里查看的方法的文档

For how to visualize a linear regression, play with the example here. I'm guessing you haven't used ipython (Now called jupyter) much either, so you should definitely invest some time into learning that. It's a great tool for exploring data and machine learning. You can literally copy/paste the example from scikit linear regression into an ipython notebook and run it

有关如何可视化线性回归，请使用此处的示例。我猜你也没有经常使用 ipython（现在称为 jupyter），所以你绝对应该花一些时间来学习它。它是探索数据和机器学习的绝佳工具。您可以将 scikit 线性回归中的示例从字面上复制/粘贴到 ipython 笔记本中并运行它

For your specific problem with the fitmethod, by referring to the docs, you can see that the format of the data you are passing in for your Xvalues is wrong.

对于该fit方法的具体问题，通过参考文档，您可以看到您为X值传递的数据格式是错误的。

Per the docs, "X : numpy array or sparse matrix of shape [n_samples,n_features]"

根据文档，“X：形状为 [n_samples,n_features] 的 numpy 数组或稀疏矩阵”

You can fix your code with this

你可以用这个修复你的代码

X = [[x] for x in data['c1'].values]

Answer 2

回答by Scott

Let's assume your csv looks something like:

让我们假设您的 csv 看起来像：

c1,c2
0.000000,0.968012
1.000000,2.712641
2.000000,11.958873
3.000000,10.889784
...

I generated the data as such:

我生成了这样的数据：

import numpy as np
from sklearn import datasets, linear_model
import matplotlib.pyplot as plt

length = 10
x = np.arange(length, dtype=float).reshape((length, 1))
y = x + (np.random.rand(length)*10).reshape((length, 1))

This data is saved to test.csv (just so you know where it came from, obviously you'll use your own).

这个数据被保存到 test.csv（只是为了让你知道它来自哪里，显然你会使用你自己的）。

data = pd.read_csv('test.csv', index_col=False, header=0)
x = data.c1.values
y = data.c2.values
print x # prints: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]

You need to take a look at the shape of the data you are feeding into .fit().

您需要查看您输入的数据的形状.fit()。

Here x.shape = (10,)but we need it to be (10, 1), see sklearn. Same goes for y. So we reshape:

在这里，x.shape = (10,)但我们需要它(10, 1)，请参阅sklearn。也一样y。所以我们重塑：

x = x.reshape(length, 1)
y = y.reshape(length, 1)

Now we create the regression object and then call fit():

现在我们创建回归对象，然后调用fit()：

regr = linear_model.LinearRegression()
regr.fit(x, y)

# plot it as in the example at http://scikit-learn.org/
plt.scatter(x, y,  color='black')
plt.plot(x, regr.predict(x), color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

See sklearn linear regression example. enter image description here

请参阅 sklearn 线性回归示例。在此处输入图片说明

Answer 3

回答by serv-inc

make predictions based on the result?

根据结果做出预测？

To predict,

为了预测，

lr = linear_model.LinearRegression().fit(X,Y)
lr.predict(X)

Is there any way I can view details of the regression?

有什么办法可以查看回归的详细信息吗？

The LinearRegression has coef_and intercept_attributes.

LinearRegression 具有coef_和intercept_属性。

lr.coef_
lr.intercept_

show the slope and intercept.

显示斜率和截距。

Answer 4

回答by Samrat Kishore

Dataset

数据集

Importing the libraries

导入库

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

Importing the dataset

导入数据集

dataset = pd.read_csv('1.csv')
X = dataset[["mark1"]]
y = dataset[["mark2"]]

Fitting Simple Linear Regression to the set

将简单线性回归拟合到集合

regressor = LinearRegression()
regressor.fit(X, y)

Predicting the set results

预测设定结果

y_pred = regressor.predict(X)

Visualising the set results

可视化设置结果

plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('mark1 vs mark2')
plt.xlabel('mark1')
plt.ylabel('mark2')
plt.show()

Answer 5

回答by seralouk

I post an answer that addresses exactly the error that you got:

我发布了一个确切解决您遇到的错误的答案：

IndexError: tuple index out of range

IndexError：元组索引超出范围

Scikit-learn expects 2D inputs. Just reshape the `X`and `Y`.

Scikit-learn 需要 2D 输入。只需重塑`X`和`Y`。

Replace:

代替：

X=data['c1'].values # this  has shape (XXX, ) - It's 1D
Y=data['c2'].values # this  has shape (XXX, ) - It's 1D
linear_model.LinearRegression().fit(X,Y)

with

和

X=data['c1'].values.reshape(-1,1) # this  has shape (XXX, 1) - it's 2D
Y=data['c2'].values.reshape(-1,1) # this  has shape (XXX, 1) - it's 2D
linear_model.LinearRegression().fit(X,Y)

Python 使用 Sklearn 对 Pandas DataFrame 进行线性回归（IndexError：元组索引超出范围）

提问by Dinosaur

回答by Tommy

回答by Scott

回答by serv-inc

回答by Samrat Kishore

Dataset

数据集

Importing the libraries

导入库

Importing the dataset

导入数据集

Fitting Simple Linear Regression to the set

将简单线性回归拟合到集合

Predicting the set results

预测设定结果

Visualising the set results

可视化设置结果

回答by seralouk

Scikit-learn expects 2D inputs. Just reshape the `X`and `Y`.

Scikit-learn 需要 2D 输入。只需重塑`X`和`Y`。

相关推荐

最近更新

标签

Python 使用 Sklearn 对 Pandas DataFrame 进行线性回归（IndexError：元组索引超出范围）

提问by Dinosaur

回答by Tommy

回答by Scott

回答by serv-inc

回答by Samrat Kishore

Dataset

数据集

Importing the libraries

导入库

Importing the dataset

导入数据集

Fitting Simple Linear Regression to the set

将简单线性回归拟合到集合

Predicting the set results

预测设定结果

Visualising the set results

可视化设置结果

回答by seralouk

Scikit-learn expects 2D inputs. Just reshape the Xand Y.

Scikit-learn 需要 2D 输入。只需重塑X和Y。

相关推荐

如何在python中取回枚举元素的名称？

Python“来自”用法

Python 如何运行烧瓶应用程序？

Python 如何在 sublime 文本编辑器中清除控制台

相关推荐

最近更新

标签

Scikit-learn expects 2D inputs. Just reshape the `X`and `Y`.

Scikit-learn 需要 2D 输入。只需重塑`X`和`Y`。