如何将带有值的列添加到 Spark Java 中的新数据集?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44957197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I add a column with a value to a new Dataset in Spark Java?
提问by Juan Carlos Nu?o
So, I'm creating some Datasets from the java Spark API. These datasets are populated from hive table, using the spark.sql() method.
所以,我正在从 java Spark API 创建一些数据集。这些数据集是使用 spark.sql() 方法从 hive 表填充的。
So, after performing some sql operations (like joins), I have a final dataset. What I want to do is that I want to add a new column to that final dataset, with a value of "1" to all the rows in the dataset. So, you could probably see it as adding a constrain to the Dataset.
因此,在执行了一些 sql 操作(如连接)之后,我有了一个最终的数据集。我想要做的是向该最终数据集添加一个新列,数据集中所有行的值为“1”。因此,您可能会将其视为向数据集添加约束。
So, for example I have this dataset:
所以,例如我有这个数据集:
Dataset<Row> final = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));
I want to add a new column to the "final" Dataset, something like this
我想向“最终”数据集添加一个新列,如下所示
final.addNewColumn("colName", 1); //I know this doesn't work, but just to give you an idea.
Is there a feasible way to add the new column to all the rows of the Dataset with a value of 1?
是否有可行的方法将新列添加到数据集的所有行中,值为 1?
回答by ktheitroadalo
If you want to add a constant value then you can use litfunction
如果你想添加一个常量值,那么你可以使用lit函数
lit(Object literal)
Creates a Column of literal value.
Also, change the variable name final to something else
另外,将变量名称 final 更改为其他名称
Dataset<Row> final12 = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));
Dataset<Row> result = final12.withColumn("columnName", lit(1))
Hope this helps!
希望这可以帮助!