java 从 HDFS 读取一个简单的 Avro 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11632067/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 05:52:29  来源:igfitidea点击:

Reading a simple Avro file from HDFS

javaioavro

提问by Wanderer

I am trying to do a simple read of an Avro file stored in HDFS. I found out how to read it when it is on the local file system....

我正在尝试对存储在 HDFS 中的 Avro 文件进行简单读取。我发现了如何在本地文件系统上读取它....

FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader());

for (GenericRecord datum : fileReader) {
   String value = datum.get(1).toString();
   System.out.println("value = " value);
}

reader.close();

My file is in HDFS, however. I cannot give the openReader a Path or an FSDataInputStream. How can I simply read an Avro file in HDFS?

但是,我的文件在 HDFS 中。我不能给 openReader 一个路径或一个 FSDataInputStream。如何简单地读取 HDFS 中的 Avro 文件?

EDIT: I got this to work by creating a custom class (SeekableHadoopInput) that implements SeekableInput. I "stole" this from "Ganglion" on github. Still, seems like there would be a Hadoop/Avro integration path for this.

编辑:我通过创建一个实现 SeekableInput 的自定义类 (SeekableHadoopInput) 来实现这一点。我从 github 上的“Ganglion”“偷”了这个。尽管如此,似乎会有一个 Hadoop/Avro 集成路径。

Thanks

谢谢

回答by Martin Kleppmann

The FsInputclass (in the avro-mapred submodule, since it depends on Hadoop) can do this. It provides the seekable input stream that is needed for Avro data files.

FsInput类(在Avro的-mapred子模块,因为它依赖于Hadoop的)可以做到这一点。它提供了 Avro 数据文件所需的可搜索输入流。

Path path = new Path("/path/on/hdfs");
Configuration config = new Configuration(); // make this your Hadoop env config
SeekableInput input = new FsInput(path, config);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);

for (GenericRecord datum : fileReader) {
    System.out.println("value = " + datum);
}

fileReader.close(); // also closes underlying FsInput