pandas.DataFrame 中一列的反向累积总和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37872565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reversed cumulative sum of a column in pandas.DataFrame
提问by wl2776
I've got a pandas DataFrame with a boolean column sorted by another column and need to calculate reverse cumulative sum of the boolean column, that is, amount of true values from current row to bottom.
我有一个带有按另一列排序的布尔列的 Pandas DataFrame,需要计算布尔列的反向累积和,即从当前行到底部的真实值的数量。
Example
例子
In [13]: df = pd.DataFrame({'A': [True] * 3 + [False] * 5, 'B': np.random.rand(8) })
In [15]: df = df.sort_values('B')
In [16]: df
Out[16]:
A B
6 False 0.037710
2 True 0.315414
4 False 0.332480
7 False 0.445505
3 False 0.580156
1 True 0.741551
5 False 0.796944
0 True 0.817563
I need something that will give me a new column with values
我需要一些能给我一个带有值的新列的东西
3
3
2
2
2
2
1
1
That is, for each row it should contain amount of True values on this row and rows below.
也就是说,对于每一行,它应该在该行和下面的行中包含一定数量的 True 值。
I've tried various methods using .iloc[::-1]
but result is not that is desired.
我尝试了各种方法,.iloc[::-1]
但结果并不理想。
Think, I'm missing an obvious thing. I've starting using Pandas only yesterday.
想想,我错过了一个明显的东西。我昨天才开始使用 Pandas。
回答by unutbu
Reverse column A, take the cumsum, then reverse again:
反转 A 列,取 cumsum,然后再次反转:
df['C'] = df.loc[::-1, 'A'].cumsum()[::-1]
import pandas as pd
df = pd.DataFrame(
{'A': [False, True, False, False, False, True, False, True],
'B': [0.03771, 0.315414, 0.33248, 0.445505, 0.580156, 0.741551, 0.796944, 0.817563],},
index=[6, 2, 4, 7, 3, 1, 5, 0])
df['C'] = df.loc[::-1, 'A'].cumsum()[::-1]
print(df)
yields
产量
A B C
6 False 0.037710 3
2 True 0.315414 3
4 False 0.332480 2
7 False 0.445505 2
3 False 0.580156 2
1 True 0.741551 2
5 False 0.796944 1
0 True 0.817563 1
Alternatively, you could count the number of True
s in column A
and subtract the (shifted) cumsum:
或者,您可以计算True
列中s的数量A
并减去(移位的)cumsum:
In [113]: df['A'].sum()-df['A'].shift(1).fillna(0).cumsum()
Out[113]:
6 3
2 3
4 2
7 2
3 2
1 2
5 1
0 1
Name: A, dtype: object
But this is significantly slower. Using IPythonto perform the benchmark:
但这要慢得多。使用IPython执行基准测试:
In [116]: df = pd.DataFrame({'A':np.random.randint(2, size=10**5).astype(bool)})
In [117]: %timeit df['A'].sum()-df['A'].shift(1).fillna(0).cumsum()
10 loops, best of 3: 19.8 ms per loop
In [118]: %timeit df.loc[::-1, 'A'].cumsum()[::-1]
1000 loops, best of 3: 701 μs per loop
回答by Ichta
Similar to unutbus first suggestion, but without the deprecated ix:
类似于 unutbus 第一个建议,但没有弃用的 ix:
df['C']=df.A[::-1].cumsum()
回答by Merlin
This works but is slow... like @unutbu answer. True resolves to 1. Fails on False, or any other value though.
这有效但很慢......就像@unutbu回答一样。True 解析为 1。在 False 或任何其他值时失败。
df[2] = df.groupby('A').cumcount(ascending=False)+1
df[1] = np.where(df['A']==True,df[2],None)
df[1] = df[1].fillna(method='bfill').fillna(0)
del df[2]
A B 1
# 3 False 0.277557 3.0
# 7 False 0.400751 3.0
# 6 False 0.431587 3.0
# 5 False 0.481006 3.0
# 1 True 0.534364 3.0
# 2 True 0.556378 2.0
# 0 True 0.863192 1.0
# 4 False 0.916247 0.0