如何在 Java 或 Scala 中从/向镶木地板文件读取和写入 Map<String, Object>?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30565510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read and write Map<String, Object> from/to parquet file in Java or Scala?
提问by okigan
Looking for a concise example on how to read and write Map<String, Object>from/to parquet file in Java or Scala?
正在寻找有关如何Map<String, Object>在 Java 或 Scala 中读取/写入parquet 文件的简明示例?
Here is expected structure, using com.fasterxml.Hymanson.databind.ObjectMapperas serializer in Java (i.e. looking for equivalent using parquet):
这是预期的结构,com.fasterxml.Hymanson.databind.ObjectMapper在 Java 中用作序列化程序(即寻找等效的使用镶木地板):
public static Map<String, Object> read(InputStream inputStream) throws IOException {
ObjectMapper objectMapper = new ObjectMapper();
return objectMapper.readValue(inputStream, new TypeReference<Map<String, Object>>() {
});
}
public static void write(OutputStream outputStream, Map<String, Object> map) throws IOException {
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.writeValue(outputStream, map);
}
回答by Sercan Ozdemir
i'm not quite good about parquet but, from here:
我不太擅长镶木地板,但是,从这里开始:
Schema schema = new Schema.Parser().parse(Resources.getResource("map.avsc").openStream());
File tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
tmp.deleteOnExit();
tmp.delete();
Path file = new Path(tmp.getPath());
AvroParquetWriter<GenericRecord> writer =
new AvroParquetWriter<GenericRecord>(file, schema);
// Write a record with an empty map.
ImmutableMap emptyMap = new ImmutableMap.Builder<String, Integer>().build();
GenericData.Record record = new GenericRecordBuilder(schema)
.set("mymap", emptyMap).build();
writer.write(record);
writer.close();
AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
GenericRecord nextRecord = reader.read();
assertNotNull(nextRecord);
assertEquals(emptyMap, nextRecord.get("mymap"));
In your situation change ImmutableMap(Google Collections) with a default Map as below:
在您的情况下ImmutableMap,使用默认地图更改(Google Collections),如下所示:
Schema schema = new Schema.Parser().parse( Resources.getResource( "map.avsc" ).openStream() );
File tmp = File.createTempFile( getClass().getSimpleName(), ".tmp" );
tmp.deleteOnExit();
tmp.delete();
Path file = new Path( tmp.getPath() );
AvroParquetWriter<GenericRecord> writer = new AvroParquetWriter<GenericRecord>( file, schema );
// Write a record with an empty map.
Map<String,Object> emptyMap = new HashMap<String, Object>();
// not empty any more
emptyMap.put( "SOMETHING", new SOMETHING() );
GenericData.Record record = new GenericRecordBuilder( schema ).set( "mymap", emptyMap ).build();
writer.write( record );
writer.close();
AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>( file );
GenericRecord nextRecord = reader.read();
assertNotNull( nextRecord );
assertEquals( emptyMap, nextRecord.get( "mymap" ) );
I didn't test the code, but give it a try..
我没有测试代码,但试一试..
回答by Dishant Kamble
I doubt there is a solution to this readily available. When you talk about Maps, its still possible to create a AvroSchema out of it provided the values of the maps is a primitive type, or a complexType which inturn has primitive type fields.
我怀疑是否有现成的解决方案。当您谈论 Maps 时,仍然可以从中创建 AvroSchema,只要 Maps 的值是原始类型,或具有原始类型字段的 complexType。
In your case,
在你的情况下,
- If you have a Map => which will create schema with values of map being int.
- If you have a Map,
- a. CustomObject has fields int, float, char ... (i.e. any primitive type) the schema generation will be valid and can then be used to successfully convert to parquet.
- b. CustomObject has fields which are non primitive, the schema generated will be malformed and the resulting ParquetWritter will fail.
- 如果你有一个 Map =>,它将创建一个 map 值为 int 的模式。
- 如果你有地图,
- 一个。CustomObject 具有字段 int、float、char ...(即任何原始类型)模式生成将是有效的,然后可用于成功转换为镶木地板。
- 湾 CustomObject 具有非原始字段,生成的架构将格式错误并且生成的 ParquetWritter 将失败。
To resolve this issue, you can try to convert your object into a JsonObjectand then use the Apache Spark libraries to convert it to Parquet.
要解决此问题,您可以尝试将对象转换为 a JsonObject,然后使用 Apache Spark 库将其转换为 Parquet。
回答by rahul
Apache Drill is your answer!
Apache Drill 就是你的答案!
Convert to parquet : You can use the CTAS(create table as) feature in drill. By default drill creates a folder with parquet files after executing the below query. You can substitute any query and drill writes the output of you query into parquet files
转换为镶木地板:您可以在钻孔中使用 CTAS(创建表为)功能。默认情况下,在执行以下查询后,drill 会创建一个包含镶木地板文件的文件夹。您可以替换任何查询并将查询的输出写入 parquet 文件
create table file_parquet as select * from dfs.`/data/file.json`;
Convert from parquet : We also use the CTAS feature here, however we request drill to use a different format for writing the output
从镶木地板转换:我们在这里也使用 CTAS 功能,但是我们要求钻取使用不同的格式来写入输出
alter session set `store.format`='json';
create table file_json as select * from dfs.`/data/file.parquet`;
Refer to http://drill.apache.org/docs/create-table-as-ctas-command/for more information
有关更多信息,请参阅http://drill.apache.org/docs/create-table-as-ctas-command/

