使用 Python 的线性回归(Pandas 和 Numpy)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36358688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:58:40  来源:igfitidea点击:

Linear regression using Python (Pandas and Numpy)

pythoncsvnumpypandasmachine-learning

提问by Pragyaditya Das

I am trying to implement linear regression using python.

我正在尝试使用 python 实现线性回归。

I did the following steps:

我做了以下步骤:

import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1

Then I try to obtain the co-efficients, and use the following:

然后我尝试获得系数,并使用以下内容:

regression_coeff = n.polyfit(x,y,1)

And then I get the following error:

然后我收到以下错误:

raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

I am unable to get my head around this, as when I print xand y, I can very clearly see that they are both 1D vectors.

我无法理解这一点,因为当我打印x和 时y,我可以非常清楚地看到它们都是一维向量。

Can someone please help?

有人可以帮忙吗?

Dataset can be found here: DataSets

数据集可以在这里找到:DataSets

The original code is:

原代码为:

import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)

回答by Mike Müller

This should work:

这应该有效:

np.polyfit(data.values.flatten(), data1.values.flatten(), 1)

datais a dataframe and its values are 2D:

data是一个数据框,它的值是二维的:

>>> data.values.shape
(546, 1)

flatten()turns it into 1D array:

flatten()把它变成一维数组:

>> data.values.flatten().shape
(546,)

which is needed for polyfit().

这是polyfit().

Simpler alternative:

更简单的选择:

df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)

回答by Alessandro

Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.

Python 告诉您数据格式不正确,特别是 x 必须是一维数组,在您的情况下它是二维Pandas数组。您可以将数据转换为 numpy 数组并压缩它以解决您的问题。

import pandas as pd
import numpy as np

data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))

x = data
y = data1
regression = np.polyfit(x, y, 1)

回答by Stefan

pandas.read_csv()returns a DataFrame, which has two dimensions while np.polyfitwants a 1D vectorfor both xand yfor a single fit. You can simply convert the output of read_csv()to a pd.Seriesto match the np.polyfit()input format using .squeeze():

pandas.read_csv()返回一个DataFrame,它有两个维度,而np.polyfit一个想1D vector两个xy单个契合。您可以简单地将 的输出转换read_csv()为 apd.Seriesnp.polyfit()使用.squeeze()以下方法匹配输入格式:

data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()