java 如何从java中的avro文件中提取模式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45496786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract schema from an avro file in java
提问by mba12
How do you extract first the schema and then the data from an avro file in java? Identical to thisquestion except in java.
你如何首先从java中的avro文件中提取模式然后提取数据?与此问题相同,但在 java 中除外。
I've seen examples of how to get the schema from an avsc file but not an avro file. Any direction much appreciated.
我已经看到了如何从 avsc 文件而不是 avro 文件中获取模式的示例。任何方向都非常感谢。
Schema schema = new Schema.Parser().parse(new File("/home/Hadoop/Avro/schema/emp.avsc"));
回答by Helder Pereira
If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader
:
如果您想知道 Avro 文件的架构,而不必生成相应的类或关心文件属于哪个类,您可以使用GenericDatumReader
:
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);
And then you can read the data inside the file:
然后你可以读取文件中的数据:
GenericRecord record = null;
while (dataFileReader.hasNext()) {
record = dataFileReader.next(record);
System.out.println(record);
}
回答by Carlos Bribiescas
You can use the data bricks library as shown here https://github.com/databricks/spark-avrowhich will load the avro file into a Dataframe
(Dataset<Row>
)
您可以使用此处显示的数据砖库https://github.com/databricks/spark-avro它将 avro 文件加载到Dataframe
( Dataset<Row>
)
Once you have a Dataset<Row>
, you can directly get the schema using df.schema()
一旦你有了Dataset<Row>
,你就可以直接使用df.schema()
回答by Eugene
Thanks for @Helder Pereira's answer. As a complement, the schema can also be fetched from getSchema()
of GenericRecord
instance.
Hereis an live demo about it, the link above shows how to get data and schema in java for Parquet
, ORC
and AVRO
data format.
感谢@Helder Pereira 的回答。作为补充,该模式也可以从获取getSchema()
的GenericRecord
实例。
这是一个关于它的现场演示,上面的链接显示了如何在 java 中获取数据和模式Parquet
,ORC
以及AVRO
数据格式。