Python Pandas 在 X 上线性插值 Y

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27217694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:43:40  来源:igfitidea点击:

Python Pandas Linear Interpolate Y over X

pythonpandaslinear-interpolation

提问by The Red Pea

I'm trying to answer this Udacity question: https://www.udacity.com/course/viewer#!/c-st101/l-48696651/e-48532778/m-48635592

我正在尝试回答这个 Udacity 问题:https://www.udacity.com/course/viewer#!/c-st101/l-48696651/e-48532778/m-48635592

I like Python & Pandas so I'm using Pandas (version 0.14)

我喜欢 Python 和 Pandas,所以我使用 Pandas(0.14 版)

I have this DataFrame df=

我有这个数据框 df=

pd.DataFrame(dict(size=(1400,
                        2400,
                        1800,
                        1900,
                        1300,
                        1100), 
                   cost=(112000,
                         192000,
                         144000,
                         152000,
                         104000,
                         88000)))

I added this value of 2100 square foot to my data frame (notice there is no cost; that is the question; what would you expect to pay for a house of 2,100sq ft)

我将这个 2100 平方英尺的值添加到我的数据框中(注意没有成本;这就是问题;您希望为2,100平方英尺的房子支付多少费用)

 df.append(pd.DataFrame({'size':(2100,)}), True)

The question wants you to answer what cost/price you expect to pay, using linear interpolation.

该问题希望您使用线性插值来回答您希望支付的成本/价格

Can Pandas interpolate? And how?

Pandas可以插值吗?如何?

I tried this:

我试过这个:

df.interpolate(method='linear')

But it gave me a cost of 88,000; just the last cost value repeated

但它给了我88,000的成本;只重复最后一个成本值

I tried this:

我试过这个:

df.sort('size').interpolate(method='linear')

But it gave me a cost of 172,000; just halfway between the costs of 152,000and 192,000Closer, but not what I want. The correct answer is 168,000(because there is a "slope" of $80/sqft)

但它给了我172,000的成本;仅介于152,000192,000Closer的成本之间 ,但不是我想要的。正确答案是168,000(因为有 80 美元/平方英尺的“斜率”)

EDIT:

编辑:

I checked these SO questions

我检查了这些问题

回答by The Red Pea

Pandas' method='linear'interpolation will do what I call "1D" interpolation

Pandas 的method='linear'插值会做我所说的“一维”插值

If you want to interpolate a "dependent" variable over an "independent" variable, make the "independent" variable; i.e. the Index of a Series, and use the method='index'(or method='values', they're the same)

如果要在“独立”变量上插入“因”变量,请制作“独立”变量;即系列的索引,并使用method='index'(或method='values',它们是相同的)

In other words:

换句话说:

pd.Series(index=df.size, data=df.cost.values) #Make size the independent variable
    .order() #Orders by the index, which is size in sq ft; interpolation depends on order (see OP)
    .interpolate(method='index')[2100] #Interpolate using method 'index'

This returns the correct answer 168,000

这将返回正确答案168,000

This is not clear to me from the example in Pandas Documentation, where the Series' dataand indexare the same list of values.

Pandas 文档中的示例中我不清楚这一点,其中 Series'dataindex是相同的值列表。

回答by Luca Rigazio

with my version of Pandas (0.19.2) index=df.size breaks unlucky choice of words -- things is size of the table ... so this works

使用我的 Pandas 版本 (0.19.2) index=df.size 打破了不幸的单词选择——事情是表格的大小......所以这有效

df=df.append(pd.DataFrame({'size':(2100,)}), True)
pd.Series(index=df['size'].values, 
data=df['cost'].values).order().interpolate(method='index')[2100]

=168000.0

=168000.0