Spark,在 Scala 中添加具有相同值的新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38587609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark, add new Column with the same value in Scala
提问by Alessandro
I have some problem with the withColumnfunction in Spark-Scala environment.
I would like to add a new Column in my DataFrame like that:
我对withColumnSpark-Scala 环境中的函数有一些问题。我想在我的 DataFrame 中添加一个新列,如下所示:
+---+----+---+
| A| B| C|
+---+----+---+
| 4|blah| 2|
| 2| | 3|
| 56| foo| 3|
|100|null| 5|
+---+----+---+
became:
变成了:
+---+----+---+-----+
| A| B| C| D |
+---+----+---+-----+
| 4|blah| 2| 750|
| 2| | 3| 750|
| 56| foo| 3| 750|
|100|null| 5| 750|
+---+----+---+-----+
the column D in one value repeated N-time for each row in my DataFrame.
一个值中的 D 列对于我的 DataFrame 中的每一行重复 N 次。
The code are this:
代码是这样的:
var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750
The variable totVehicles returns the correct value, it's works!
变量 totVehicles 返回正确的值,它有效!
The second DataFrame has to calculate 2 fields (id_zipcode, n_vehicles), and add the third column (with the same value -750):
第二个 DataFrame 必须计算 2 个字段(id_zipcode、n_vehicles),并添加第三列(具有相同的值 -750):
var df_nVehicles =
df_carPark.filter(
substring($"id_time",1,4) < 2013
).groupBy(
$"id_zipcode"
).agg(
sum($"n_vehicles") as 'n_vehicles
).select(
$"id_zipcode" as 'id_zipcode,
'n_vehicles
).orderBy(
'id_zipcode,
'n_vehicles
);
Finally, I add the new column with withColumnfunction:
最后,我添加了具有以下withColumn功能的新列:
var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))
But Spark returns me this error:
但是 Spark 返回给我这个错误:
error: value withColumn is not a member of Unit
var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))
Can you help me? Thank you very much!
你能帮助我吗?非常感谢你!
回答by Rockie Yang
litfunction is for adding literal values as a column
lit功能是将文字值添加为列
import org.apache.spark.sql.functions._
df.withColumn("D", lit(750))

