java Spark - 按数据帧语法分组？

Question

提问by lte__

What's the syntax for using a groupby-having in Spark without an sql/hiveContext? I know I can do

在没有 sql/hiveContext 的 Spark 中使用 groupby-have 的语法是什么？我知道我能做到

DataFrame df = some_df
df.registreTempTable("df");    
df1 = sqlContext.sql("SELECT * FROM df GROUP BY col1 HAVING some stuff")

but how do I do it with a syntax like

但是我如何使用类似的语法来做到这一点

df.select(df.col("*")).groupBy(df.col("col1")).having("some stuff")

This .having()does not seem to exist.

这.having()似乎不存在。

Answer 1

回答by zero323

Yes, it doesn't exist. You express the same logic with aggfollowed by where:

是的，它不存在。你用agg后跟表达相同的逻辑where：

df.groupBy(someExpr).agg(somAgg).where(somePredicate)

Answer 2

回答by Sri_Karthik

Say for example if I want to find products in each category, having fees less than 3200 and their count must not be less than 10:

例如，如果我想在每个类别中查找费用低于 3200 且数量不少于 10 的产品：

SQL query:

SQL查询：

sqlContext.sql("select Category,count(*) as 
count from hadoopexam where HadoopExamFee<3200  
group by Category having count>10")

DataFrames API

数据帧 API

from pyspark.sql.functions import *

df.filter(df.HadoopExamFee<3200)
  .groupBy('Category')
  .agg(count('Category').alias('count'))
  .filter(column('count')>10)

java Spark - 按数据帧语法分组？

提问by lte__

回答by zero323

回答by Sri_Karthik

相关推荐

最近更新

标签

java Spark - 按数据帧语法分组？

提问by lte__

回答by zero323

回答by Sri_Karthik

相关推荐

java 如何在java中打开和保存excel文件

java 如何修复 Fortify Race Condition：Singleton Member Field 问题

java 教程上的android错误找不到符号变量activity_display_message

java 默认导航抽屉视图到 ExpandableListView

相关推荐

最近更新

标签