Pandas - 具有基于行索引的条件的 lambda 函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47878410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - lambda function with conditional based on row index
提问by user1185790
I am trying to apply a lambda function to a dataframe by referencing three columns. I want to update one of the columns, Cumulative Total
, based on the following logic:
我试图通过引用三列将 lambda 函数应用于数据帧。我想Cumulative Total
根据以下逻辑更新列之一:
If it's on the first row, then Cumulative Total
should equal the value in Total
.
If it's not the first row, then apply the following formula that references the prior row:
如果它在第一行,那么Cumulative Total
应该等于 中的值Total
。如果它不是第一行,则应用以下引用前一行的公式:
x.shift()['Cumulative Total']
- (x.shift()['Total'] * (x.shift()['Annualized Rate'] / 1200))
x.shift()['Cumulative Total']
- (x.shift()['Total'] * (x.shift()['Annualized Rate'] / 1200))
I want the Cumulative Total
column to look like so:
我希望该Cumulative Total
列看起来像这样:
Total Annualized Rate Cumulative Total
869 11.04718067 869
868 5.529953917 861
871 8.266360505 857
873 6.872852234 851
873 8.24742268 846
874 9.610983982 840
870 5.517241379 833
871 8.266360505 829
868 2.764976959 823
What is throwing me off is how I can determine whether or not I'm on the first row. This sounds rather trivial, but I'm very new to Pandas and am totally stumped. iloc
doesn't seem to work, as it seems to only be used for grabbing a row of a given index.
让我失望的是我如何确定我是否在第一排。这听起来相当微不足道,但我对 Pandas 很陌生并且完全被难住了。iloc
似乎不起作用,因为它似乎只用于抓取给定索引的一行。
The code is currently as follows:
目前代码如下:
df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.iloc[0] else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)
df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.iloc[0] else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)
The statement if x.iloc[0]
is wrong. Any idea on how I can determine if it's the first row?
说法if x.iloc[0]
是错误的。关于如何确定它是否是第一行的任何想法?
Edit: thank you all for your answers. Alexander's answer is on the right track, but I've noticed that the results strayed somewhat from what was to be expected. These differences became more pronounced the larger the dataframe used.
编辑:谢谢大家的回答。亚历山大的回答是正确的,但我注意到结果与预期有所不同。使用的数据框越大,这些差异就越明显。
Alexander - can you address this issue with an edit to your answer? Using vanilla Python, I've arrived at the results below. The differences are largely trivial, but as stated, can get more pronounced with larger datasets.
亚历山大 - 你能通过编辑你的答案来解决这个问题吗?使用 vanilla Python,我得到了下面的结果。差异在很大程度上是微不足道的,但如前所述,在更大的数据集上会变得更加明显。
total=(869,868,871,873,873,874,870,871,868)
rate=(11.047181,5.529954,8.266361,6.872852,8.247423,9.610984,5.517241,8.266361,2.764977)
def f(total,rate):
cum = []
for i in range(len(total)):
if i == 0:
cum.append(total[i])
else:
cum.append(float(cum[i-1])-(float(total[i-1])*(rate[i-1]/1200.0)))
return cum
f(total, rate)
Returns:
返回:
869
860.9999997591667
856.9999996991667
850.99999934
845.9999995100001
839.9999992775
832.9999992641667
828.9999995391668
822.9999991800001
采纳答案by Alexander
Perhaps this?
也许这个?
df = df.assign(
Cumulative_Total=df['Total'].iat[0]
- ((df['Total'] * df['Annualized Rate'].div(1200))
.shift()
.fillna(0)
.cumsum())
)
>>> df
Total Annualized Rate Cumulative_Total
0 869 11.047181 869
1 868 5.529954 861
2 871 8.266361 857
3 873 6.872852 851
4 873 8.247423 846
5 874 9.610984 840
6 870 5.517241 833
7 871 8.266361 829
8 868 2.764977 823
回答by Julien Spronck
Would this work? In this solution, I used x.name
to get the row index.
这行得通吗?在这个解决方案中,我曾经x.name
获取行索引。
df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.name == 0 else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)