Pandas - 具有基于行索引的条件的 lambda 函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47878410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:55:14  来源:igfitidea点击:

Pandas - lambda function with conditional based on row index

pythonpandasif-statementlambda

提问by user1185790

I am trying to apply a lambda function to a dataframe by referencing three columns. I want to update one of the columns, Cumulative Total, based on the following logic:

我试图通过引用三列将 lambda 函数应用于数据帧。我想Cumulative Total根据以下逻辑更新列之一:

If it's on the first row, then Cumulative Totalshould equal the value in Total. If it's not the first row, then apply the following formula that references the prior row:

如果它在第一行,那么Cumulative Total应该等于 中的值Total。如果它不是第一行,则应用以下引用前一行的公式:

x.shift()['Cumulative Total'] - (x.shift()['Total'] * (x.shift()['Annualized Rate'] / 1200))

x.shift()['Cumulative Total'] - (x.shift()['Total'] * (x.shift()['Annualized Rate'] / 1200))

I want the Cumulative Totalcolumn to look like so:

我希望该Cumulative Total列看起来像这样:

Total   Annualized Rate Cumulative Total
869     11.04718067     869
868     5.529953917     861
871     8.266360505     857
873     6.872852234     851
873     8.24742268      846
874     9.610983982     840
870     5.517241379     833
871     8.266360505     829
868     2.764976959     823

What is throwing me off is how I can determine whether or not I'm on the first row. This sounds rather trivial, but I'm very new to Pandas and am totally stumped. ilocdoesn't seem to work, as it seems to only be used for grabbing a row of a given index.

让我失望的是我如何确定我是否在第一排。这听起来相当微不足道,但我对 Pandas 很陌生并且完全被难住了。iloc似乎不起作用,因为它似乎只用于抓取给定索引的一行。

The code is currently as follows:

目前代码如下:

df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.iloc[0] else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)

df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.iloc[0] else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)

The statement if x.iloc[0]is wrong. Any idea on how I can determine if it's the first row?

说法if x.iloc[0]是错误的。关于如何确定它是否是第一行的任何想法?

Edit: thank you all for your answers. Alexander's answer is on the right track, but I've noticed that the results strayed somewhat from what was to be expected. These differences became more pronounced the larger the dataframe used.

编辑:谢谢大家的回答。亚历山大的回答是正确的,但我注意到结果与预期有所不同。使用的数据框越大,这些差异就越明显。

Alexander - can you address this issue with an edit to your answer? Using vanilla Python, I've arrived at the results below. The differences are largely trivial, but as stated, can get more pronounced with larger datasets.

亚历山大 - 你能通过编辑你的答案来解决这个问题吗?使用 vanilla Python,我得到了下面的结果。差异在很大程度上是微不足道的,但如前所述,在更大的数据集上会变得更加明显。

total=(869,868,871,873,873,874,870,871,868)
rate=(11.047181,5.529954,8.266361,6.872852,8.247423,9.610984,5.517241,8.266361,2.764977)
def f(total,rate):
     cum = []
     for i in range(len(total)):
         if i == 0:
             cum.append(total[i])
         else:
             cum.append(float(cum[i-1])-(float(total[i-1])*(rate[i-1]/1200.0)))
     return cum
f(total, rate)

Returns:

返回:

869
860.9999997591667
856.9999996991667
850.99999934
845.9999995100001
839.9999992775
832.9999992641667
828.9999995391668
822.9999991800001

采纳答案by Alexander

Perhaps this?

也许这个?

df = df.assign(
    Cumulative_Total=df['Total'].iat[0] 
                     - ((df['Total'] * df['Annualized Rate'].div(1200))
                        .shift()
                        .fillna(0)
                        .cumsum())
)

>>> df
   Total  Annualized Rate  Cumulative_Total
0    869        11.047181               869
1    868         5.529954               861
2    871         8.266361               857
3    873         6.872852               851
4    873         8.247423               846
5    874         9.610984               840
6    870         5.517241               833
7    871         8.266361               829
8    868         2.764977               823

回答by Julien Spronck

Would this work? In this solution, I used x.nameto get the row index.

这行得通吗?在这个解决方案中,我曾经x.name获取行索引。

df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.name == 0 else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)