相当于 SQL 窗口函数的 Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41573232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas equivalent to SQL window functions
提问by 2daaa
Is there an idiomatic equivalent to SQL's window functions in Pandas? For example, what's the most compact way to write the equivalent of this in Pandas?:
是否有与 Pandas 中 SQL 的窗口函数等价的惯用语?例如,在 Pandas 中编写与此等效的最紧凑的方法是什么?:
SELECT state_name,
state_population,
SUM(state_population)
OVER() AS national_population
FROM population
ORDER BY state_name
Or this?:
或这个?:
SELECT state_name,
state_population,
region,
SUM(state_population)
OVER(PARTITION BY region) AS regional_population
FROM population
ORDER BY state_name
回答by MaxU
For the first SQL:
对于第一个 SQL:
SELECT state_name,
state_population,
SUM(state_population)
OVER() AS national_population
FROM population
ORDER BY state_name
Pandas:
Pandas:
df.assign(national_population=df.state_population.sum()).sort_values('state_name')
For the second SQL:
对于第二个 SQL:
SELECT state_name,
state_population,
region,
SUM(state_population)
OVER(PARTITION BY region) AS regional_population
FROM population
ORDER BY state_name
Pandas:
Pandas:
df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
.sort_values('state_name')
DEMO:
演示:
In [238]: df
Out[238]:
region state_name state_population
0 1 aaa 100
1 1 bbb 110
2 2 ccc 200
3 2 ddd 100
4 2 eee 100
5 3 xxx 55
national_population:
national_population:
In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
region state_name state_population national_population
0 1 aaa 100 665
1 1 bbb 110 665
2 2 ccc 200 665
3 2 ddd 100 665
4 2 eee 100 665
5 3 xxx 55 665
regional_population:
region_population:
In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
...: .sort_values('state_name')
Out[239]:
region state_name state_population regional_population
0 1 aaa 100 210
1 1 bbb 110 210
2 2 ccc 200 400
3 2 ddd 100 400
4 2 eee 100 400
5 3 xxx 55 55