scala 如何访问数组列中的值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47585279/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:30:35  来源:igfitidea点击:

How to access values in array column?

scalaapache-sparkapache-spark-sql

提问by user3439308

I have a Dataframe with one column. Each row of that column has an Array of String values:

我有一个带有一列的数据框。该列的每一行都有一个字符串值数组:

Values in my Spark 2.2 Dataframe

我的 Spark 2.2 数据框中的值

["123", "abc", "2017", "ABC"]
["456", "def", "2001", "ABC"]
["789", "ghi", "2017", "DEF"]

org.apache.spark.sql.DataFrame = [col: array]

root
|-- col: array (nullable = true)
|    |-- element: string (containsNull = true)

What is the best way to access elements in the array? For example, I would like extract distinct values in the fourth element for the year 2017 (answer "ABC", "DEF").

访问数组中元素的最佳方式是什么?例如,我想在 2017 年的第四个元素中提取不同的值(答案“ABC”、“DEF”)。

回答by Fermat's Little Student

回答by Jacek Laskowski

What is the best way to access elements in the array?

访问数组中元素的最佳方式是什么?

Accessing elements in an array column is by getItemoperator.

访问数组列中的元素是通过getItem运算符。

getItem(key: Any): ColumnAn expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.

getItem(key: Any): Column一个表达式,它从数组中的第 ordinal 位置获取一个项目,或者通过 a 中的键键获取一个值MapType

You could also use (ordinal)to access an element at ordinalposition.

您还可以使用(ordinal)访问ordinal位置的元素。

val ds = Seq(
  Array("123", "abc", "2017", "ABC"),
  Array("456", "def", "2001", "ABC"),
  Array("789", "ghi", "2017", "DEF")).toDF("col")
scala> ds.printSchema
root
 |-- col: array (nullable = true)
 |    |-- element: string (containsNull = true)
scala> ds.select($"col"(2)).show
+------+
|col[2]|
+------+
|  2017|
|  2001|
|  2017|
+------+

It's just a matter of personal choice and taste which approach suits you better, i.e. getItemor simply (ordinal).

这只是个人选择和品味的问题,哪种方法更适合您,即getItem或简单(ordinal)

And in your case where/ filterfollowed by selectwith distinctgive the proper answer (as @Will did).

而在你的情况下where/filter后面selectdistinct提供正确的答案(如@Will一样)。

回答by parthasarathi0317

you can do something like below

你可以做类似下面的事情

import org.apache.spark.sql.functions._

val ds = Seq(
 Array("123", "abc", "2017", "ABC"),
 Array("456", "def", "2001", "ABC"),
 Array("789", "ghi", "2017", "DEF")).toDF("col")

ds.withColumn("col1",element_at('col,1))
.withColumn("col2",element_at('col,2))
.withColumn("col3",element_at('col,3))
.withColumn("col4",element_at('col,4))
.drop('col)
.show()

+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| 123| abc|2017| ABC|
| 456| def|2001| ABC|
| 789| ghi|2017| DEF|
+----+----+----+----+