使用 Java API 创建一个简单的 1 行 Spark DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39967194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a simple 1-row Spark DataFrame with Java API
提问by smeeb
In Scala, I can create a single-row DataFrame from an in-memory string like so:
在 Scala 中,我可以从内存中的字符串创建一个单行数据帧,如下所示:
val stringAsList = List("buzz")
val df = sqlContext.sparkContext.parallelize(jsonValues).toDF("fizz")
df.show()
When df.show()
runs, it outputs:
当df.show()
运行时,它输出:
+-----+
| fizz|
+-----+
| buzz|
+-----+
Now I'm trying to do this from inside a Java class.Apparently JavaRDD
s don't have a toDF(String)
method. I've tried:
现在我正在尝试从 Java 类内部执行此操作。显然JavaRDD
s没有toDF(String)
方法。我试过了:
List<String> stringAsList = new ArrayList<String>();
stringAsList.add("buzz");
SQLContext sqlContext = new SQLContext(sparkContext);
DataFrame df = sqlContext.createDataFrame(sparkContext
.parallelize(stringAsList), StringType);
df.show();
...but still seem to be coming up short. Now when df.show();
executes, I get:
......但似乎仍然不足。现在df.show();
执行时,我得到:
++
||
++
||
++
(An empty DF.) So I ask: Using the Java API, how do I read an in-memory string into a DataFrame that has only 1 row and 1 column in it, and also specify the name of that column?(So that the df.show()
is identical to the Scala one above)?
(一个空的 DF。)所以我问:使用Java API,如何将内存中的字符串读入一个只有 1 行和 1 列的 DataFrame 中,并指定该列的名称?(所以它df.show()
与上面的 Scala 相同)?
采纳答案by cody123
You can achieve this by creating List to Rdd and than create Schema which will contain column name.
您可以通过创建 List 到 Rdd 而不是创建包含列名的 Schema 来实现这一点。
There might be other ways as well, it's just one of them.
可能还有其他方式,这只是其中一种。
List<String> stringAsList = new ArrayList<String>();
stringAsList.add("buzz");
JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String row) -> {
return RowFactory.create(row);
});
StructType schema = DataTypes.createStructType(new StructField[] { DataTypes.createStructField("fizz", DataTypes.StringType, false) });
DataFrame df = sqlContext.createDataFrame(rowRDD, schema).toDF();
df.show();
//+----+
|fizz|
+----+
|buzz|
回答by jgp
I have created 2 examples for Spark 2 if you need to upgrade:
如果您需要升级,我已经为 Spark 2 创建了 2 个示例:
Simple Fizz/Buzz (or foe/bar - old generation :) ):
简单的 Fizz/Buzz(或敌人/酒吧 - 老一代 :) ):
SparkSession spark = SparkSession.builder().appName("Build a DataFrame from Scratch").master("local[*]")
.getOrCreate();
List<String> stringAsList = new ArrayList<>();
stringAsList.add("bar");
JavaSparkContext sparkContext = new JavaSparkContext(spark.sparkContext());
JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String row) -> RowFactory.create(row));
// Creates schema
StructType schema = DataTypes.createStructType(
new StructField[] { DataTypes.createStructField("foe", DataTypes.StringType, false) });
Dataset<Row> df = spark.sqlContext().createDataFrame(rowRDD, schema).toDF();
2x2 data:
2x2 数据:
SparkSession spark = SparkSession.builder().appName("Build a DataFrame from Scratch").master("local[*]")
.getOrCreate();
List<String[]> stringAsList = new ArrayList<>();
stringAsList.add(new String[] { "bar1.1", "bar2.1" });
stringAsList.add(new String[] { "bar1.2", "bar2.2" });
JavaSparkContext sparkContext = new JavaSparkContext(spark.sparkContext());
JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String[] row) -> RowFactory.create(row));
// Creates schema
StructType schema = DataTypes
.createStructType(new StructField[] { DataTypes.createStructField("foe1", DataTypes.StringType, false),
DataTypes.createStructField("foe2", DataTypes.StringType, false) });
Dataset<Row> df = spark.sqlContext().createDataFrame(rowRDD, schema).toDF();
Code can be downloaded from: https://github.com/jgperrin/net.jgp.labs.spark.