Java 如何将 Spark Row 的数据集转换为字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42389203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert the datasets of Spark Row into string?
提问by Jaffer Wilson
I have written the code to access the Hive table using SparkSQL. Here is the code:
我已经编写了使用 SparkSQL 访问 Hive 表的代码。这是代码:
SparkSession spark = SparkSession
.builder()
.appName("Java Spark Hive Example")
.master("local[*]")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
Dataset<Row> df = spark.sql("select survey_response_value from health").toDF();
df.show();
I would like to know how I can convert the complete output to String or String array? As I am trying to work with another module where only I can pass String or String type Array values.
I have tried other methods like .toString
or typecast to String values. But did not worked for me.
Kindly let me know how I can convert the DataSet values to String?
我想知道如何将完整的输出转换为字符串或字符串数组?当我尝试使用另一个模块时,只有我可以传递字符串或字符串类型的数组值。
我尝试过其他方法,例如.toString
或类型转换为字符串值。但没有为我工作。
请让我知道如何将数据集值转换为字符串?
采纳答案by abaghel
Here is the sample code in Java.
这是 Java 中的示例代码。
public class SparkSample {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("SparkSample")
.master("local[*]")
.getOrCreate();
//create df
List<String> myList = Arrays.asList("one", "two", "three", "four", "five");
Dataset<Row> df = spark.createDataset(myList, Encoders.STRING()).toDF();
df.show();
//using df.as
List<String> listOne = df.as(Encoders.STRING()).collectAsList();
System.out.println(listOne);
//using df.map
List<String> listTwo = df.map(row -> row.mkString(), Encoders.STRING()).collectAsList();
System.out.println(listTwo);
}
}
"row" is java 8 lambda parameter. Please check developer.com/java/start-using-java-lambda-expressions.html
“行”是 java 8 lambda 参数。请查看developer.com/java/start-using-java-lambda-expressions.html
回答by hage
You can use the U)(implicitevidence$6:org.apache.spark.sql.Encoder[U]):org.apache.spark.sql.Dataset[U]" rel="noreferrer">map
function to convert every row into a string, e.g.:
您可以使用该U)(implicitevidence$6:org.apache.spark.sql.Encoder[U]):org.apache.spark.sql.Dataset[U]" rel="noreferrer">map
函数将每一行转换为字符串,例如:
df.map(row => row.mkString())
Instead of just mkString
you can of course do more sophisticated work
mkString
当然,您不仅可以做更复杂的工作
The collect
method then can retreive the whole thing into an array
collect
然后该方法可以将整个事物检索到一个数组中
val strings = df.map(row => row.mkString()).collect
(This is the Scala syntax, I think in Java it's quite similar)
(这是 Scala 语法,我认为在 Java 中它非常相似)
回答by Areeha
If you are planning to read the dataset line by line, then you can use the iterator over the dataset:
如果您打算逐行读取数据集,则可以在数据集上使用迭代器:
Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);
for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
String item = (iter.next()).toString();
System.out.println(item.toString());
}