SparkSQL 并在 Java 中的 DataFrame 上爆炸
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31859271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SparkSQL and explode on DataFrame in Java
提问by JiriS
Is there an easy way how use explode
on array column on SparkSQL DataFrame
? It's relatively simple in Scala, but this function seems to be unavailable (as mentioned in javadoc) in Java.
有没有一种简单的方法如何explode
在 SparkSQL 上的数组列上使用DataFrame
?在 Scala 中比较简单,但是这个功能在 Java 中似乎不可用(如 javadoc 中所述)。
An option is to use SQLContext.sql(...)
and explode
function inside the query, but I'm looking for a bit better and especially cleaner way. DataFrame
s are loaded from parquet files.
一个选项是在查询中使用SQLContext.sql(...)
和explode
运行,但我正在寻找更好,尤其是更简洁的方法。DataFrame
s 从镶木地板文件加载。
采纳答案by JiriS
It seems it is possible to use a combination of org.apache.spark.sql.functions.explode(Column col)
and DataFrame.withColumn(String colName, Column col)
to replace the column with the exploded version of it.
似乎可以使用组合org.apache.spark.sql.functions.explode(Column col)
并DataFrame.withColumn(String colName, Column col)
用它的分解版本替换该列。
回答by marilena.oita
I solved it in this manner: say that you have an array column containing job descriptions named "positions", for each person with "fullName".
我以这种方式解决了它:假设您有一个数组列,其中包含名为“职位”的职位描述,每个人都有“全名”。
Then you get from initial schema :
然后你从初始架构中得到:
root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- companyName: string (nullable = true)
| | |-- title: string (nullable = true)
...
to schema:
架构:
root
|-- personName: string (nullable = true)
|-- companyName: string (nullable = true)
|-- positionTitle: string (nullable = true)
by doing:
通过做:
DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),
org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));
DataFrame test = personPositions.select(personPositions.col("personName"),
personPositions.col("pos").getField("companyName").as("companyName"), personPositions.col("pos").getField("title").as("positionTitle"));