scala Spark 数据框将列值获取到字符串变量中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37753091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:22:15  来源:igfitidea点击:

Spark dataframe get column value into a string variable

scalaapache-sparkdataframeapache-spark-sql

提问by G G

I am trying extract column value into a variable so that I can use the value somewhere else in the code. I am trying like the following

我正在尝试将列值提取到一个变量中,以便我可以在代码中的其他地方使用该值。我正在尝试如下

 val name= test.filter(test("id").equalTo("200")).select("name").col("name")

It returns

它返回

 name org.apache.spark.sql.Column = name

how to get the value?

如何获得价值?

回答by Yuan JI

The col("name")gives you a column expression. If you want to extract data from column "name" just do the same thing without col("name"):

col("name")给你一个列表达式。如果您想从“名称”列中提取数据,只需执行相同的操作即可col("name")

val names = test.filter(test("id").equalTo("200"))
                .select("name")
                .collectAsList() // returns a List[Row]

Then for a row you could get name in String by:

然后对于一行,您可以通过以下方式在字符串中获取名称:

val name = row.getString(0)

回答by Rajiv Singh

val maxDate = spark.sql("select max(export_time) as export_time from  tier1_spend.cost_gcp_raw").first()

val rowValue = maxDate.get(0)

回答by Aman Sehgal

By this snippet, you can extract all the values in a column into a string. Modify the snippet with where clauses to get your desired value.

通过这个片段,您可以将一列中的所有值提取到一个字符串中。使用 where 子句修改代码段以获得所需的值。

val df = Seq((5, 2), (10, 1)).toDF("A", "B")

val col_val_df = df.select($"A").collect()
val col_val_str = col_val_df.map(x => x.get(0)).mkString(",")

/*
df: org.apache.spark.sql.DataFrame = [A: int, B: int]
col_val_row: Array[org.apache.spark.sql.Row] = Array([5], [10])
col_val_str: String = 5,10
*/

The value of entire column is stored in col_val_str

整个列的值存储在 col_val_str

col_val_str: String = 5,10

回答by afeldman

For anyone interested below is an way to turn a column into an Array, for the below case we are just taking the first value.

对于任何感兴趣的人来说,下面是一种将列转换为数组的方法,对于下面的情况,我们只取第一个值。

val names= test.filter(test("id").equalTo("200")).selectExpr("name").rdd.map(x=>x.mkString).collect
val name = names(0)