如何在 Java 或 Scala 中从/向镶木地板文件读取和写入 Map<String, Object>？

Question

提问by okigan

Looking for a concise example on how to read and write Map<String, Object>from/to parquet file in Java or Scala?

正在寻找有关如何Map<String, Object>在 Java 或 Scala 中读取/写入parquet 文件的简明示例？

Here is expected structure, using com.fasterxml.Hymanson.databind.ObjectMapperas serializer in Java (i.e. looking for equivalent using parquet):

这是预期的结构，com.fasterxml.Hymanson.databind.ObjectMapper在 Java 中用作序列化程序（即寻找等效的使用镶木地板）：

public static Map<String, Object> read(InputStream inputStream) throws IOException {
    ObjectMapper objectMapper = new ObjectMapper();

    return objectMapper.readValue(inputStream, new TypeReference<Map<String, Object>>() {

    });
}

public static void write(OutputStream outputStream, Map<String, Object> map) throws IOException {
    ObjectMapper objectMapper = new ObjectMapper();

    objectMapper.writeValue(outputStream, map);        
}

Answer 1

回答by Sercan Ozdemir

i'm not quite good about parquet but, from here:

我不太擅长镶木地板，但是，从这里开始：

Schema schema = new Schema.Parser().parse(Resources.getResource("map.avsc").openStream());

    File tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
    tmp.deleteOnExit();
    tmp.delete();
    Path file = new Path(tmp.getPath());

    AvroParquetWriter<GenericRecord> writer = 
        new AvroParquetWriter<GenericRecord>(file, schema);

    // Write a record with an empty map.
    ImmutableMap emptyMap = new ImmutableMap.Builder<String, Integer>().build();
    GenericData.Record record = new GenericRecordBuilder(schema)
        .set("mymap", emptyMap).build();
    writer.write(record);
    writer.close();

    AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
    GenericRecord nextRecord = reader.read();

    assertNotNull(nextRecord);
    assertEquals(emptyMap, nextRecord.get("mymap"));

In your situation change ImmutableMap(Google Collections) with a default Map as below:

在您的情况下ImmutableMap，使用默认地图更改（Google Collections），如下所示：

Schema schema = new Schema.Parser().parse( Resources.getResource( "map.avsc" ).openStream() );

        File tmp = File.createTempFile( getClass().getSimpleName(), ".tmp" );
        tmp.deleteOnExit();
        tmp.delete();
        Path file = new Path( tmp.getPath() );

        AvroParquetWriter<GenericRecord> writer = new AvroParquetWriter<GenericRecord>( file, schema );

        // Write a record with an empty map.
        Map<String,Object> emptyMap = new HashMap<String, Object>();

        // not empty any more
        emptyMap.put( "SOMETHING", new SOMETHING() );
        GenericData.Record record = new GenericRecordBuilder( schema ).set( "mymap", emptyMap ).build();
        writer.write( record );
        writer.close();

        AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>( file );
        GenericRecord nextRecord = reader.read();

        assertNotNull( nextRecord );
        assertEquals( emptyMap, nextRecord.get( "mymap" ) );

I didn't test the code, but give it a try..

我没有测试代码，但试一试..

Answer 2

回答by Dishant Kamble

I doubt there is a solution to this readily available. When you talk about Maps, its still possible to create a AvroSchema out of it provided the values of the maps is a primitive type, or a complexType which inturn has primitive type fields.

我怀疑是否有现成的解决方案。当您谈论 Maps 时，仍然可以从中创建 AvroSchema，只要 Maps 的值是原始类型，或具有原始类型字段的 complexType。

In your case,

在你的情况下，

If you have a Map => which will create schema with values of map being int.
If you have a Map,
- a. CustomObject has fields int, float, char ... (i.e. any primitive type) the schema generation will be valid and can then be used to successfully convert to parquet.
- b. CustomObject has fields which are non primitive, the schema generated will be malformed and the resulting ParquetWritter will fail.

如果你有一个 Map =>，它将创建一个 map 值为 int 的模式。
如果你有地图，
- 一个。CustomObject 具有字段 int、float、char ...（即任何原始类型）模式生成将是有效的，然后可用于成功转换为镶木地板。
- 湾 CustomObject 具有非原始字段，生成的架构将格式错误并且生成的 ParquetWritter 将失败。

To resolve this issue, you can try to convert your object into a JsonObjectand then use the Apache Spark libraries to convert it to Parquet.

要解决此问题，您可以尝试将对象转换为 a JsonObject，然后使用 Apache Spark 库将其转换为 Parquet。

Answer 3

回答by rahul

Apache Drill is your answer!

Apache Drill 就是你的答案！

Convert to parquet : You can use the CTAS(create table as) feature in drill. By default drill creates a folder with parquet files after executing the below query. You can substitute any query and drill writes the output of you query into parquet files

转换为镶木地板：您可以在钻孔中使用 CTAS（创建表为）功能。默认情况下，在执行以下查询后，drill 会创建一个包含镶木地板文件的文件夹。您可以替换任何查询并将查询的输出写入 parquet 文件

create table file_parquet as select * from dfs.`/data/file.json`;

Convert from parquet : We also use the CTAS feature here, however we request drill to use a different format for writing the output

从镶木地板转换：我们在这里也使用 CTAS 功能，但是我们要求钻取使用不同的格式来写入输出

alter session set `store.format`='json';
create table file_json as select * from dfs.`/data/file.parquet`;

Refer to http://drill.apache.org/docs/create-table-as-ctas-command/for more information

有关更多信息，请参阅http://drill.apache.org/docs/create-table-as-ctas-command/

如何在 Java 或 Scala 中从/向镶木地板文件读取和写入 Map<String, Object>？

提问by okigan

回答by Sercan Ozdemir

回答by Dishant Kamble

回答by rahul

相关推荐

最近更新

标签

如何在 Java 或 Scala 中从/向镶木地板文件读取和写入 Map<String, Object>？

提问by okigan

回答by Sercan Ozdemir

回答by Dishant Kamble

回答by rahul

相关推荐

如何将 Scala 流的内容写入文件？

scala 如何用双引号“作为分隔符分割字符串？

scala java.lang.NoSuchMethodError Jackson 数据绑定和 Spark

Scala - 如果文件存在，则删除文件，Scala 方式

相关推荐

最近更新

标签