使用 Python 的线性回归（Pandas 和 Numpy）

Question

提问by Pragyaditya Das

I am trying to implement linear regression using python.

我正在尝试使用 python 实现线性回归。

I did the following steps:

我做了以下步骤：

import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1

Then I try to obtain the co-efficients, and use the following:

然后我尝试获得系数，并使用以下内容：

regression_coeff = n.polyfit(x,y,1)

And then I get the following error:

然后我收到以下错误：

raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

I am unable to get my head around this, as when I print xand y, I can very clearly see that they are both 1D vectors.

我无法理解这一点，因为当我打印x和时y，我可以非常清楚地看到它们都是一维向量。

Can someone please help?

有人可以帮忙吗？

Dataset can be found here: DataSets

数据集可以在这里找到：DataSets

The original code is:

原代码为：

import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)

Answer 1

回答by Mike Müller

This should work:

这应该有效：

np.polyfit(data.values.flatten(), data1.values.flatten(), 1)

datais a dataframe and its values are 2D:

data是一个数据框，它的值是二维的：

>>> data.values.shape
(546, 1)

flatten()turns it into 1D array:

flatten()把它变成一维数组：

>> data.values.flatten().shape
(546,)

which is needed for polyfit().

这是polyfit().

Simpler alternative:

更简单的选择：

df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)

Answer 2

回答by Alessandro

Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.

Python 告诉您数据格式不正确，特别是 x 必须是一维数组，在您的情况下它是二维Pandas数组。您可以将数据转换为 numpy 数组并压缩它以解决您的问题。

import pandas as pd
import numpy as np

data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))

x = data
y = data1
regression = np.polyfit(x, y, 1)

Answer 3

回答by Stefan

pandas.read_csv()returns a DataFrame, which has two dimensions while np.polyfitwants a 1D vectorfor both xand yfor a single fit. You can simply convert the output of read_csv()to a pd.Seriesto match the np.polyfit()input format using .squeeze():

pandas.read_csv()返回一个DataFrame，它有两个维度，而np.polyfit一个想1D vector两个x和y单个契合。您可以简单地将的输出转换read_csv()为 apd.Series以np.polyfit()使用.squeeze()以下方法匹配输入格式：

data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()

使用 Python 的线性回归（Pandas 和 Numpy）

提问by Pragyaditya Das

回答by Mike Müller

回答by Alessandro

回答by Stefan

相关推荐

最近更新

标签

使用 Python 的线性回归（Pandas 和 Numpy）

提问by Pragyaditya Das

回答by Mike Müller

回答by Alessandro

回答by Stefan

相关推荐

pandas 在python中用pandas对系列进行排序

Pandas：根据特定列的值计数选择行

python pandas：根据列值拆分数据框

如何使用 Jython 导入 Pandas

相关推荐

最近更新

标签