Java 如何将 Spark Row 的数据集转换为字符串？

Question

提问by Jaffer Wilson

I have written the code to access the Hive table using SparkSQL. Here is the code:

我已经编写了使用 SparkSQL 访问 Hive 表的代码。这是代码：

SparkSession spark = SparkSession
        .builder()
        .appName("Java Spark Hive Example")
        .master("local[*]")
        .config("hive.metastore.uris", "thrift://localhost:9083")
        .enableHiveSupport()
        .getOrCreate();
Dataset<Row> df =  spark.sql("select survey_response_value from health").toDF();
df.show();

I would like to know how I can convert the complete output to String or String array? As I am trying to work with another module where only I can pass String or String type Array values.
I have tried other methods like .toStringor typecast to String values. But did not worked for me.
Kindly let me know how I can convert the DataSet values to String?

我想知道如何将完整的输出转换为字符串或字符串数组？当我尝试使用另一个模块时，只有我可以传递字符串或字符串类型的数组值。
我尝试过其他方法，例如.toString或类型转换为字符串值。但没有为我工作。
请让我知道如何将数据集值转换为字符串？

Answer 1

采纳答案by abaghel

Here is the sample code in Java.

这是 Java 中的示例代码。

public class SparkSample {
    public static void main(String[] args) {
        SparkSession spark = SparkSession
            .builder()
            .appName("SparkSample")
            .master("local[*]")
            .getOrCreate();
    //create df
    List<String> myList = Arrays.asList("one", "two", "three", "four", "five");
    Dataset<Row> df = spark.createDataset(myList, Encoders.STRING()).toDF();
    df.show();
    //using df.as
    List<String> listOne = df.as(Encoders.STRING()).collectAsList();
    System.out.println(listOne);
    //using df.map
    List<String> listTwo = df.map(row -> row.mkString(), Encoders.STRING()).collectAsList();
    System.out.println(listTwo);
  }
}

"row" is java 8 lambda parameter. Please check developer.com/java/start-using-java-lambda-expressions.html

“行”是 java 8 lambda 参数。请查看developer.com/java/start-using-java-lambda-expressions.html

Answer 2

回答by hage

You can use the U)(implicitevidence$6:org.apache.spark.sql.Encoder[U]):org.apache.spark.sql.Dataset[U]" rel="noreferrer">mapfunction to convert every row into a string, e.g.:

您可以使用该U)(implicitevidence$6:org.apache.spark.sql.Encoder[U]):org.apache.spark.sql.Dataset[U]" rel="noreferrer">map函数将每一行转换为字符串，例如：

df.map(row => row.mkString())

Instead of just mkStringyou can of course do more sophisticated work

mkString当然，您不仅可以做更复杂的工作

The collectmethod then can retreive the whole thing into an array

collect然后该方法可以将整个事物检索到一个数组中

val strings = df.map(row => row.mkString()).collect

(This is the Scala syntax, I think in Java it's quite similar)

（这是 Scala 语法，我认为在 Java 中它非常相似）

Answer 3

回答by Areeha

If you are planning to read the dataset line by line, then you can use the iterator over the dataset:

如果您打算逐行读取数据集，则可以在数据集上使用迭代器：

 Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);

for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
    String item = (iter.next()).toString();
    System.out.println(item.toString());    
}

Java 如何将 Spark Row 的数据集转换为字符串？

提问by Jaffer Wilson

采纳答案by abaghel

回答by hage

回答by Areeha

相关推荐

最近更新

标签

Java 如何将 Spark Row 的数据集转换为字符串？

提问by Jaffer Wilson

采纳答案by abaghel

回答by hage

回答by Areeha

相关推荐

如何抑制特定目录或文件（例如生成的代码）的 Java 警告

在 Java 中以通用方式实现可比较的方法进行排序

Java，默认编码

Java 如何将 selenium webdriver 实例传递给另一个类

相关推荐

最近更新

标签