java 如何使用java从Spark中的列表或数组创建行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39696403/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create a Row from a List or Array in Spark using java
提问by user2736706
In Java, I use RowFactory.create() to create a Row:
在 Java 中,我使用 RowFactory.create() 创建一个 Row:
Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));
where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to use a List or an Array to create the "row". In Scala, I can use Row.fromSeq() to create a Row from a List or an Array, but how can I achieve that in Java?
其中“记录”是来自数据库的记录,但我无法提前知道“记录”的长度,因此我想使用列表或数组来创建“行”。在 Scala 中,我可以使用 Row.fromSeq() 从列表或数组创建一个行,但是我如何在 Java 中实现呢?
回答by Andrushenko Alexander
We often need to create Datasets or Dataframes in real world applications. Here is an example of how to create Rows and Dataset in a Java application:
我们经常需要在现实世界的应用程序中创建数据集或数据帧。以下是如何在 Java 应用程序中创建行和数据集的示例:
// initialize first SQLContext
SQLContext sqlContext = ...
StructType schemata = DataTypes.createStructType(
new StructField[]{
createStructField("NAME", StringType, false),
createStructField("STRING_VALUE", StringType, false),
createStructField("NUM_VALUE", IntegerType, false),
});
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
+-----+------------+---------+
| NAME|STRING_VALUE|NUM_VALUE|
+-----+------------+---------+
|name1| value1| 1|
|name2| value2| 2|
+-----+------------+---------+
回答by abaghel
I am not sure if I get your question correctly but you can use the RowFactory to create Row from ArrayList in java.
我不确定我是否正确回答了您的问题,但您可以使用 RowFactory 在 java 中从 ArrayList 创建 Row。
List<MyData> mlist = new ArrayList<MyData>();
mlist.add(d1);
mlist.add(d2);
Row row = RowFactory.create(mlist.toArray());
回答by Sanjay Singh
//Create a a list of DTO
//创建一个DTO列表
List<MyDTO> dtoList = Arrays.asList(.....));
//Create a Dataset of DTO
//创建一个DTO的数据集
Dataset<MyDTO> dtoSet = sparkSession.createDataset(dtoList,
Encoders.bean(MyDTO.class));
//If you need dataset of Row
//如果你需要Row的数据集
Dataset<Row> rowSet= dtoSet .select("col1","col2","col3");
回答by Alex Stanovsky
For simple list values you can use Encoders
:
对于简单的列表值,您可以使用Encoders
:
List<Row> rows = ImmutableList.of(RowFactory.create(new Timestamp(currentTime)));
Dataset<Row> input = sparkSession.createDataFrame(rows, Encoders.TIMESTAMP().schema());