SparkSQL 并在 Java 中的 DataFrame 上爆炸

Question

提问by JiriS

Is there an easy way how use explodeon array column on SparkSQL DataFrame? It's relatively simple in Scala, but this function seems to be unavailable (as mentioned in javadoc) in Java.

有没有一种简单的方法如何explode在 SparkSQL 上的数组列上使用DataFrame？在 Scala 中比较简单，但是这个功能在 Java 中似乎不可用（如 javadoc 中所述）。

An option is to use SQLContext.sql(...)and explodefunction inside the query, but I'm looking for a bit better and especially cleaner way. DataFrames are loaded from parquet files.

一个选项是在查询中使用SQLContext.sql(...)和explode运行，但我正在寻找更好，尤其是更简洁的方法。DataFrames 从镶木地板文件加载。

Answer 1

采纳答案by JiriS

It seems it is possible to use a combination of org.apache.spark.sql.functions.explode(Column col)and DataFrame.withColumn(String colName, Column col)to replace the column with the exploded version of it.

似乎可以使用组合org.apache.spark.sql.functions.explode(Column col)并DataFrame.withColumn(String colName, Column col)用它的分解版本替换该列。

Answer 2

回答by marilena.oita

I solved it in this manner: say that you have an array column containing job descriptions named "positions", for each person with "fullName".

我以这种方式解决了它：假设您有一个数组列，其中包含名为“职位”的职位描述，每个人都有“全名”。

Then you get from initial schema :

然后你从初始架构中得到：

root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- companyName: string (nullable = true)
    |    |    |-- title: string (nullable = true)
...

to schema:

架构：

root
 |-- personName: string (nullable = true)
 |-- companyName: string (nullable = true)
 |-- positionTitle: string (nullable = true)

by doing:

通过做：

    DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),
          org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));

    DataFrame test = personPositions.select(personPositions.col("personName"),
    personPositions.col("pos").getField("companyName").as("companyName"), personPositions.col("pos").getField("title").as("positionTitle"));

SparkSQL 并在 Java 中的 DataFrame 上爆炸

提问by JiriS

采纳答案by JiriS

回答by marilena.oita

相关推荐

最近更新

标签

SparkSQL 并在 Java 中的 DataFrame 上爆炸

提问by JiriS

采纳答案by JiriS

回答by marilena.oita

相关推荐

java 将 ArrayList 转换为 HashMap<String, String>

java 在 javaFX 8 中获取节点的屏幕坐标

java.net.Socket TCP keep-alive 用法

java 将 cassandra blob 类型转换为字符串

相关推荐

最近更新

标签