scala 如何访问数组列中的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47585279/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to access values in array column?
提问by user3439308
I have a Dataframe with one column. Each row of that column has an Array of String values:
我有一个带有一列的数据框。该列的每一行都有一个字符串值数组:
Values in my Spark 2.2 Dataframe
我的 Spark 2.2 数据框中的值
["123", "abc", "2017", "ABC"]
["456", "def", "2001", "ABC"]
["789", "ghi", "2017", "DEF"]
org.apache.spark.sql.DataFrame = [col: array]
root
|-- col: array (nullable = true)
| |-- element: string (containsNull = true)
What is the best way to access elements in the array? For example, I would like extract distinct values in the fourth element for the year 2017 (answer "ABC", "DEF").
访问数组中元素的最佳方式是什么?例如,我想在 2017 年的第四个元素中提取不同的值(答案“ABC”、“DEF”)。
回答by pathikrit
Since Spark 2.4.0, there is a new function element_at($array_column, $index).
从 Spark 2.4.0 开始,有一个新功能element_at($array_column, $index)。
请参阅:https: //spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$@element_at(column: org.apache.spark.sql.Column,value :Any):org.apache.spark.sql.Column
回答by Fermat's Little Student
df.where($"col".getItem(2) === lit("2017")).select($"col".getItem(3))
see getItemfrom https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
看到getItem从https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
回答by Jacek Laskowski
What is the best way to access elements in the array?
访问数组中元素的最佳方式是什么?
Accessing elements in an array column is by getItemoperator.
访问数组列中的元素是通过getItem运算符。
getItem(key: Any): ColumnAn expression that gets an item at position ordinal out of an array, or gets a value by key key in a
MapType.
getItem(key: Any): Column一个表达式,它从数组中的第 ordinal 位置获取一个项目,或者通过 a 中的键键获取一个值
MapType。
You could also use (ordinal)to access an element at ordinalposition.
您还可以使用(ordinal)访问ordinal位置的元素。
val ds = Seq(
Array("123", "abc", "2017", "ABC"),
Array("456", "def", "2001", "ABC"),
Array("789", "ghi", "2017", "DEF")).toDF("col")
scala> ds.printSchema
root
|-- col: array (nullable = true)
| |-- element: string (containsNull = true)
scala> ds.select($"col"(2)).show
+------+
|col[2]|
+------+
| 2017|
| 2001|
| 2017|
+------+
It's just a matter of personal choice and taste which approach suits you better, i.e. getItemor simply (ordinal).
这只是个人选择和品味的问题,哪种方法更适合您,即getItem或简单(ordinal)。
And in your case where/ filterfollowed by selectwith distinctgive the proper answer (as @Will did).
而在你的情况下where/filter后面select有distinct提供正确的答案(如@Will一样)。
回答by parthasarathi0317
you can do something like below
你可以做类似下面的事情
import org.apache.spark.sql.functions._
val ds = Seq(
Array("123", "abc", "2017", "ABC"),
Array("456", "def", "2001", "ABC"),
Array("789", "ghi", "2017", "DEF")).toDF("col")
ds.withColumn("col1",element_at('col,1))
.withColumn("col2",element_at('col,2))
.withColumn("col3",element_at('col,3))
.withColumn("col4",element_at('col,4))
.drop('col)
.show()
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| 123| abc|2017| ABC|
| 456| def|2001| ABC|
| 789| ghi|2017| DEF|
+----+----+----+----+

