使用 Python 的线性回归(Pandas 和 Numpy)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36358688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linear regression using Python (Pandas and Numpy)
提问by Pragyaditya Das
I am trying to implement linear regression using python.
我正在尝试使用 python 实现线性回归。
I did the following steps:
我做了以下步骤:
import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1
Then I try to obtain the co-efficients, and use the following:
然后我尝试获得系数,并使用以下内容:
regression_coeff = n.polyfit(x,y,1)
And then I get the following error:
然后我收到以下错误:
raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x
I am unable to get my head around this, as when I print x
and y
, I can very clearly see that they are both 1D vectors.
我无法理解这一点,因为当我打印x
和 时y
,我可以非常清楚地看到它们都是一维向量。
Can someone please help?
有人可以帮忙吗?
Dataset can be found here: DataSets
数据集可以在这里找到:DataSets
The original code is:
原代码为:
import pandas as p
import numpy as n
data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])
x = data
y = data1
regression = n.polyfit(x, y, 1)
回答by Mike Müller
This should work:
这应该有效:
np.polyfit(data.values.flatten(), data1.values.flatten(), 1)
data
is a dataframe and its values are 2D:
data
是一个数据框,它的值是二维的:
>>> data.values.shape
(546, 1)
flatten()
turns it into 1D array:
flatten()
把它变成一维数组:
>> data.values.flatten().shape
(546,)
which is needed for polyfit()
.
这是polyfit()
.
Simpler alternative:
更简单的选择:
df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)
回答by Alessandro
Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.
Python 告诉您数据格式不正确,特别是 x 必须是一维数组,在您的情况下它是二维Pandas数组。您可以将数据转换为 numpy 数组并压缩它以解决您的问题。
import pandas as pd
import numpy as np
data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))
x = data
y = data1
regression = np.polyfit(x, y, 1)
回答by Stefan
pandas.read_csv()
returns a DataFrame
, which has two dimensions while np.polyfit
wants a 1D vector
for both x
and y
for a single fit. You can simply convert the output of read_csv()
to a pd.Series
to match the np.polyfit()
input format using .squeeze()
:
pandas.read_csv()
返回一个DataFrame
,它有两个维度,而np.polyfit
一个想1D vector
两个x
和y
单个契合。您可以简单地将 的输出转换read_csv()
为 apd.Series
以np.polyfit()
使用.squeeze()
以下方法匹配输入格式:
data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()