pandas 如何按列值对python pandas数据帧进行十等分,然后对每个十分位数求和?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45244018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to decile python pandas dataframe by column value, and then sum each decile?
提问by Windtalker
Say a dataframe only has one numeric column, order it desc.
假设一个数据框只有一个数字列,按 desc 排序。
What I want to get is a new dataframe with 10 rows, row 1 is sum of smallest 10% values then row 10 is sum of largest 10% values.
我想要的是一个有 10 行的新数据框,第 1 行是最小 10% 值的总和,然后第 10 行是最大 10% 值的总和。
I can calculate this via a non-pythonic way but I guess there must be a fashion and pythonic way to achieve this.
我可以通过非 Pythonic 的方式来计算这个,但我想必须有一种时尚和 Pythonic 的方式来实现这一点。
Any help?
有什么帮助吗?
Thanks!
谢谢!
回答by cmaher
You can do this with pd.qcut
:
你可以这样做pd.qcut
:
df = pd.DataFrame({'A':np.random.randn(100)})
# pd.qcut(df.A, 10) will bin into deciles
# you can group by these deciles and take the sums in one step like so:
df.groupby(pd.qcut(df.A, 10))['A'].sum()
# A
# (-2.662, -1.209] -16.436286
# (-1.209, -0.866] -10.348697
# (-0.866, -0.612] -7.133950
# (-0.612, -0.323] -4.847695
# (-0.323, -0.129] -2.187459
# (-0.129, 0.0699] -0.678615
# (0.0699, 0.368] 2.007176
# (0.368, 0.795] 5.457153
# (0.795, 1.386] 11.551413
# (1.386, 3.664] 20.575449