scala 在 Spark 中读取 Avro 文件

Question

提问by Gayatri

I have read an avro file into spark RDD and need to conver that into a sql dataframe. how do I do that.

我已将 avro 文件读入 spark RDD，需要将其转换为 sql 数据帧。我怎么做。

This is what I did so far.

这是我到目前为止所做的。

import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}
import org.apache.hadoop.io.NullWritable

val path = "hdfs://dds-nameservice/user/ghagh/"
val avroRDD = sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable, AvroInputFormat[GenericRecord]](path)

When I do:

当我做：

avro.take(1)

I get back

我回来了

res1: Array[(org.apache.avro.mapred.AvroWrapper[org.apache.avro.generic.GenericRecord], org.apache.hadoop.io.NullWritable)] = Array(({"column1": "value1", "column2": "value2", "column3": value3,...

How do I convert this to a SparkSQL dataframe?

如何将其转换为 SparkSQL 数据框？

I am using Spark 1.6

我正在使用 Spark 1.6

Can anyone tell me if there is an easy solution around this?

谁能告诉我是否有一个简单的解决方案？

Answer 1

回答by Alper t. Turker

For DataFrameI'd go with Avro data source directly:

因为DataFrame我会直接使用 Avro数据源：

Include spark-avro in packages list. For the latest version use:
```
com.databricks:spark-avro_2.11:3.2.0
```

Load the file:

val df = spark.read
  .format("com.databricks.spark.avro")
  .load(path)

在包列表中包含 spark-avro。对于最新版本，请使用：
```
com.databricks:spark-avro_2.11:3.2.0
```

加载文件：

val df = spark.read
  .format("com.databricks.spark.avro")
  .load(path)

Answer 2

回答by Manoj Kumar Dhakad

If your project is maven then add below latest dependency in pom.xml

如果您的项目是 maven，则在 pom.xml 中添加以下最新依赖项

<dependency>
   <groupId>com.databricks</groupId>
   <artifactId>spark-avro_2.11</artifactId>
   <version>4.0.0</version>
</dependency>

After that you can read avrofile like below

之后，您可以读取如下avro文件

val df=spark.read.format("com.databricks.spark.avro").option("header","true").load("C:\Users\alice\inputs\sample_data.avro")

scala 在 Spark 中读取 Avro 文件

提问by Gayatri

回答by Alper t. Turker

回答by Manoj Kumar Dhakad

相关推荐

最近更新

标签

scala 在 Spark 中读取 Avro 文件

提问by Gayatri

回答by Alper t. Turker

回答by Manoj Kumar Dhakad

相关推荐

scala Apache Spark 如何将列表/数组中的新列附加到 Spark 数据帧

Scala 子字符串函数

scala Spark dataframe写方法写很多小文件

scala 如何在代码的任何位置获取当前 SparkSession？

相关推荐

最近更新

标签