如何将带有值的列添加到 Spark Java 中的新数据集？

Question

提问by Juan Carlos Nu?o

So, I'm creating some Datasets from the java Spark API. These datasets are populated from hive table, using the spark.sql() method.

所以，我正在从 java Spark API 创建一些数据集。这些数据集是使用 spark.sql() 方法从 hive 表填充的。

So, after performing some sql operations (like joins), I have a final dataset. What I want to do is that I want to add a new column to that final dataset, with a value of "1" to all the rows in the dataset. So, you could probably see it as adding a constrain to the Dataset.

因此，在执行了一些 sql 操作（如连接）之后，我有了一个最终的数据集。我想要做的是向该最终数据集添加一个新列，数据集中所有行的值为“1”。因此，您可能会将其视为向数据集添加约束。

So, for example I have this dataset:

所以，例如我有这个数据集：

Dataset<Row> final = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));

I want to add a new column to the "final" Dataset, something like this

我想向“最终”数据集添加一个新列，如下所示

final.addNewColumn("colName", 1); //I know this doesn't work, but just to give you an idea.

Is there a feasible way to add the new column to all the rows of the Dataset with a value of 1?

是否有可行的方法将新列添加到数据集的所有行中，值为 1？

Answer 1

回答by ktheitroadalo

If you want to add a constant value then you can use litfunction

如果你想添加一个常量值，那么你可以使用lit函数

lit(Object literal)
Creates a Column of literal value.

Also, change the variable name final to something else

另外，将变量名称 final 更改为其他名称

Dataset<Row> final12 = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));


Dataset<Row> result = final12.withColumn("columnName", lit(1))

Hope this helps!

希望这可以帮助！

如何将带有值的列添加到 Spark Java 中的新数据集？

提问by Juan Carlos Nu?o

回答by ktheitroadalo

相关推荐

最近更新

标签

如何将带有值的列添加到 Spark Java 中的新数据集？

提问by Juan Carlos Nu?o

回答by ktheitroadalo

相关推荐

通过 jenkins 管道删除文件时出现“java.nio.file.AccessDeniedException:/home/jenkins/workspace/testCases/41/1/1.in”

java 简单的 Kafka 消费者没有收到消息

java JUnit 错误：“无法加载 ApplicationContext”

java 如何解析无效（错误/格式不正确）的 XML？

相关推荐

最近更新

标签