java 如何从java中的avro文件中提取模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45496786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 08:42:42  来源:igfitidea点击:

How to extract schema from an avro file in java

javaavroavro-tools

提问by mba12

How do you extract first the schema and then the data from an avro file in java? Identical to thisquestion except in java.

你如何首先从java中的avro文件中提取模式然后提取数据?与问题相同,但在 java 中除外。

I've seen examples of how to get the schema from an avsc file but not an avro file. Any direction much appreciated.

我已经看到了如何从 avsc 文件而不是 avro 文件中获取模式的示例。任何方向都非常感谢。

Schema schema = new Schema.Parser().parse(new File("/home/Hadoop/Avro/schema/emp.avsc"));

回答by Helder Pereira

If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader:

如果您想知道 Avro 文件的架构,而不必生成相应的类或关心文件属于哪个类,您可以使用GenericDatumReader

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);

And then you can read the data inside the file:

然后你可以读取文件中的数据:

GenericRecord record = null;
while (dataFileReader.hasNext()) {
    record = dataFileReader.next(record);
    System.out.println(record);
}

回答by Carlos Bribiescas

You can use the data bricks library as shown here https://github.com/databricks/spark-avrowhich will load the avro file into a Dataframe(Dataset<Row>)

您可以使用此处显示的数据砖库https://github.com/databricks/spark-avro它将 avro 文件加载到Dataframe( Dataset<Row>)

Once you have a Dataset<Row>, you can directly get the schema using df.schema()

一旦你有了Dataset<Row>,你就可以直接使用df.schema()

回答by Eugene

Thanks for @Helder Pereira's answer. As a complement, the schema can also be fetched from getSchema()of GenericRecordinstance.
Hereis an live demo about it, the link above shows how to get data and schema in java for Parquet, ORCand AVROdata format.

感谢@Helder Pereira 的回答。作为补充,该模式也可以从获取getSchema()GenericRecord实例。
是一个关于它的现场演示,上面的链接显示了如何在 java 中获取数据和模式ParquetORC以及AVRO数据格式。