相当于 SQL 窗口函数的 Pandas

Question

提问by 2daaa

Is there an idiomatic equivalent to SQL's window functions in Pandas? For example, what's the most compact way to write the equivalent of this in Pandas?:

是否有与 Pandas 中 SQL 的窗口函数等价的惯用语？例如，在 Pandas 中编写与此等效的最紧凑的方法是什么？：

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name

Or this?:

或这个？：

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

Answer 1

回答by MaxU

For the first SQL:

对于第一个 SQL：

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name

Pandas:

Pandas：

df.assign(national_population=df.state_population.sum()).sort_values('state_name')

For the second SQL:

对于第二个 SQL：

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

Pandas:

Pandas：

df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
  .sort_values('state_name')

DEMO:

演示：

In [238]: df
Out[238]:
   region state_name  state_population
0       1        aaa               100
1       1        bbb               110
2       2        ccc               200
3       2        ddd               100
4       2        eee               100
5       3        xxx                55

national_population:

national_population：

In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
   region state_name  state_population  national_population
0       1        aaa               100                  665
1       1        bbb               110                  665
2       2        ccc               200                  665
3       2        ddd               100                  665
4       2        eee               100                  665
5       3        xxx                55                  665

regional_population:

region_population：

In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
     ...:   .sort_values('state_name')
Out[239]:
   region state_name  state_population  regional_population
0       1        aaa               100                  210
1       1        bbb               110                  210
2       2        ccc               200                  400
3       2        ddd               100                  400
4       2        eee               100                  400
5       3        xxx                55                   55

相当于 SQL 窗口函数的 Pandas

提问by 2daaa

回答by MaxU

相关推荐

最近更新

标签

相当于 SQL 窗口函数的 Pandas

提问by 2daaa

回答by MaxU

相关推荐

Pandas dataframe.to_html() - 为标题添加背景颜色

在 Python pandas DataFrame 中为数字添加千位分隔符的简单方法

使用 Pandas read_html 的问题

pandas 熊猫将两列与空值组合在一起

相关推荐

最近更新

标签