Python pandas 数据框添加前一行值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/19076539/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas dataframe add previous row values
提问by Georges Cunty
I have a pandas dataframe that looks like this:
我有一个如下所示的 Pandas 数据框:
                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00     0     0     0    0
2011-01-12 16:00:00     0     0     0    0
2011-01-13 16:00:00 -1500  4000     0    0
2011-01-14 16:00:00     0     0     0    0
2011-01-18 16:00:00     0     0     0    0
My goal is to fill the rows by adding the previous row values. The result would look like this:
我的目标是通过添加前一行值来填充行。结果如下所示:
                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00  1500     0     0    0
2011-01-12 16:00:00  1500     0     0    0
2011-01-13 16:00:00     0  4000     0    0
2011-01-14 16:00:00     0  4000     0    0
2011-01-18 16:00:00     0  4000     0    0
I tried to iterate through the dataframe index with
我试图遍历数据帧索引
    for date in df.index:
and to increment dates with
并增加日期
    dt_nextDate = date + dt.timedelta(days=1)
but there are gaps in the dataframe index that stand for weekends.
但是数据框索引中存在代表周末的空白。
Can I iterate through the index from the second row to the end, refer back to the previous row and add the values?
我可以从第二行到最后遍历索引,返回到前一行并添加值吗?
回答by Viktor Kerkez
Your example result is not the output of your example algorithm, so I'm not sure what are you exactly asking for?
您的示例结果不是示例算法的输出,所以我不确定您到底要什么?
The desired result you showed is a cumulative sum, which you can get using:
您显示的所需结果是累积总和,您可以使用它:
>>> df.cumsum()
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00  1500     0     0    0
2011-01-1316:00:00     0  4000     0    0
2011-01-1416:00:00     0  4000     0    0
2011-01-1816:00:00     0  4000     0    0
But the thing you explained you want and the algorithm you showed, are more likely to be a rolling sum with a window size equals to 2:
但是你解释你想要的东西和你展示的算法更有可能是一个窗口大小等于 2 的滚动总和:
>>> result = pd.rolling_sum(df, 2)
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00   NaN   NaN   NaN  NaN
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0
To fix the NaNs just do:
要修复NaNs 只需执行以下操作:
>>> result.iloc[0,:] = df.iloc[0,:]
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0

