相当于 SQL 窗口函数的 Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41573232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:45:33  来源:igfitidea点击:

Pandas equivalent to SQL window functions

pythonsqlpandaswindow-functions

提问by 2daaa

Is there an idiomatic equivalent to SQL's window functions in Pandas? For example, what's the most compact way to write the equivalent of this in Pandas?:

是否有与 Pandas 中 SQL 的窗口函数等价的惯用语?例如,在 Pandas 中编写与此等效的最紧凑的方法是什么?:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 

Or this?:

或这个?:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

回答by MaxU

For the first SQL:

对于第一个 SQL:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 

Pandas:

Pandas:

df.assign(national_population=df.state_population.sum()).sort_values('state_name')

For the second SQL:

对于第二个 SQL:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

Pandas:

Pandas:

df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
  .sort_values('state_name')

DEMO:

演示:

In [238]: df
Out[238]:
   region state_name  state_population
0       1        aaa               100
1       1        bbb               110
2       2        ccc               200
3       2        ddd               100
4       2        eee               100
5       3        xxx                55

national_population:

national_population:

In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
   region state_name  state_population  national_population
0       1        aaa               100                  665
1       1        bbb               110                  665
2       2        ccc               200                  665
3       2        ddd               100                  665
4       2        eee               100                  665
5       3        xxx                55                  665

regional_population:

region_population:

In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
     ...:   .sort_values('state_name')
Out[239]:
   region state_name  state_population  regional_population
0       1        aaa               100                  210
1       1        bbb               110                  210
2       2        ccc               200                  400
3       2        ddd               100                  400
4       2        eee               100                  400
5       3        xxx                55                   55