Java 如何使用 Apache Avro 对 JSON 字符串进行 Avro 二进制编码？

Question

提问by AKIWEB

I am trying to avro binary encode my JSON String. Below is my JSON String and I have created a simple method which will do the conversion but I am not sure whether the way I am doing is correct or not?

我正在尝试对我的 JSON 字符串进行 avro 二进制编码。下面是我的 JSON 字符串，我创建了一个简单的方法来进行转换，但我不确定我的做法是否正确？

public static void main(String args[]) throws Exception{
try{
    Schema schema = new Parser().parse((TestExample.class.getResourceAsStream("/3233.avsc")));
    String json="{"+
        "  \"location\" : {"+
        "    \"devices\":["+
        "      {"+
        "        \"did\":\"9abd09-439bcd-629a8f\","+
        "        \"dt\":\"browser\","+
        "        \"usl\":{"+
        "          \"pos\":{"+
        "            \"source\":\"GPS\","+
        "            \"lat\":90.0,"+
        "            \"long\":101.0,"+
        "            \"acc\":100"+
        "          },"+
        "          \"addSource\":\"LL\","+
        "          \"add\":["+
        "            {"+
        "              \"val\":\"2123\","+
        "              \"type\" : \"NUM\""+
        "            },"+
        "            {"+
        "              \"val\":\"Harris ST\","+
        "              \"type\" : \"ST\""+
        "            }"+
        "          ],"+
        "          \"ei\":{"+
        "            \"ibm\":true,"+
        "            \"sr\":10,"+
        "            \"ienz\":true,"+
        "            \"enz\":100,"+
        "            \"enr\":10"+
        "          },"+
        "          \"lm\":1390598086120"+
        "        }"+
        "      }"+
        "    ],"+
        "    \"ver\" : \"1.0\""+
        "  }"+
        "}";

    byte[] avroByteArray = fromJsonToAvro(json,schema);

} catch (Exception ex) {
    // log an exception
}

Below method will convert my JSON String to Avro Binary encoded -

下面的方法将我的 JSON 字符串转换为 Avro 二进制编码 -

private static byte[] fromJsonToAvro(String json, Schema schema) throws Exception {

    InputStream input = new ByteArrayInputStream(json.getBytes());
    DataInputStream din = new DataInputStream(input);   

    Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

    DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
    Object datum = reader.read(null, decoder);


    GenericDatumWriter<Object>  w = new GenericDatumWriter<Object>(schema);
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);

    w.write(datum, e);
    e.flush();

    return outputStream.toByteArray();
}

Can anyone take a look and let me know whether the way I am trying to avro binary my JSON String is correct or not?

任何人都可以看看并让我知道我尝试 avro 二进制我的 JSON 字符串的方式是否正确？

Answer 1

回答by Keegan

I think OP is correct. This will write Avro records themselves without the schema that would be present if this were an Avro data file.

我认为 OP 是正确的。如果这是一个 Avro 数据文件，这将自己写入 Avro 记录，而不会存在架构。

Here's a couple examples within Avro itself (useful if you are working with files.
• From JSON to Avro: DataFileWriteTool
• From Avro to JSON: DataFileReadTool

以下是 Avro 本身的几个示例（如果您正在处理文件，则很有用。
• 从 JSON 到 Avro：DataFileWriteTool
• 从 Avro 到 JSON：DataFileReadTool

Here's a complete example going both ways.

这是一个双向的完整示例。

@Grapes([
    @Grab(group='org.apache.avro', module='avro', version='1.7.7')
])

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;

import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;

String schema = '''{
  "type":"record",
  "namespace":"foo",
  "name":"Person",
  "fields":[
    {
      "name":"name",
      "type":"string"
    },
    {
      "name":"age",
      "type":"int"
    }
  ]
}'''
String json = "{" +
  "\"name\":\"Frank\"," +
  "\"age\":47" +
"}"

assert avroToJson(jsonToAvro(json, schema), schema) == json


public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
    InputStream input = null;
    GenericDatumWriter<GenericRecord> writer = null;
    Encoder encoder = null;
    ByteArrayOutputStream output = null;
    try {
        Schema schema = new Schema.Parser().parse(schemaStr);
        DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
        input = new ByteArrayInputStream(json.getBytes());
        output = new ByteArrayOutputStream();
        DataInputStream din = new DataInputStream(input);
        writer = new GenericDatumWriter<GenericRecord>(schema);
        Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
        encoder = EncoderFactory.get().binaryEncoder(output, null);
        GenericRecord datum;
        while (true) {
            try {
                datum = reader.read(null, decoder);
            } catch (EOFException eofe) {
                break;
            }
            writer.write(datum, encoder);
        }
        encoder.flush();
        return output.toByteArray();
    } finally {
        try { input.close(); } catch (Exception e) { }
    }
}

public static String avroToJson(byte[] avro, String schemaStr) throws IOException {
    boolean pretty = false;
    GenericDatumReader<GenericRecord> reader = null;
    JsonEncoder encoder = null;
    ByteArrayOutputStream output = null;
    try {
        Schema schema = new Schema.Parser().parse(schemaStr);
        reader = new GenericDatumReader<GenericRecord>(schema);
        InputStream input = new ByteArrayInputStream(avro);
        output = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
        encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
        Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
        GenericRecord datum;
        while (true) {
            try {
                datum = reader.read(null, decoder);
            } catch (EOFException eofe) {
                break;
            }
            writer.write(datum, encoder);
        }
        encoder.flush();
        output.flush();
        return new String(output.toByteArray());
    } finally {
        try { if (output != null) output.close(); } catch (Exception e) { }
    }
}

For the sake of completeness, here's an example if you were working with streams (Avro calls these container files) instead of records. Note that when you go back from JSON to Avro, you don't need to pass the schema. This is because it is present in the stream.

为完整起见，这里有一个示例，如果您使用的是流（Avro 称这些容器文件）而不是记录。请注意，当您从 JSON 返回到 Avro 时，您不需要传递架构。这是因为它存在于流中。

@Grapes([
    @Grab(group='org.apache.avro', module='avro', version='1.7.7')
])

// writes Avro as a http://avro.apache.org/docs/current/spec.html#Object+Container+Files rather than a sequence of records

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileStream;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;


String schema = '''{
  "type":"record",
  "namespace":"foo",
  "name":"Person",
  "fields":[
    {
      "name":"name",
      "type":"string"
    },
    {
      "name":"age",
      "type":"int"
    }
  ]
}'''
String json = "{" +
  "\"name\":\"Frank\"," +
  "\"age\":47" +
"}"

assert avroToJson(jsonToAvro(json, schema)) == json


public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
    InputStream input = null;
    DataFileWriter<GenericRecord> writer = null;
    Encoder encoder = null;
    ByteArrayOutputStream output = null;
    try {
        Schema schema = new Schema.Parser().parse(schemaStr);
        DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
        input = new ByteArrayInputStream(json.getBytes());
        output = new ByteArrayOutputStream();
        DataInputStream din = new DataInputStream(input);
        writer = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>());
        writer.create(schema, output);
        Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
        GenericRecord datum;
        while (true) {
            try {
                datum = reader.read(null, decoder);
            } catch (EOFException eofe) {
                break;
            }
            writer.append(datum);
        }
        writer.flush();
        return output.toByteArray();
    } finally {
        try { input.close(); } catch (Exception e) { }
    }
}

public static String avroToJson(byte[] avro) throws IOException {
    boolean pretty = false;
    GenericDatumReader<GenericRecord> reader = null;
    JsonEncoder encoder = null;
    ByteArrayOutputStream output = null;
    try {
        reader = new GenericDatumReader<GenericRecord>();
        InputStream input = new ByteArrayInputStream(avro);
        DataFileStream<GenericRecord> streamReader = new DataFileStream<GenericRecord>(input, reader);
        output = new ByteArrayOutputStream();
        Schema schema = streamReader.getSchema();
        DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
        encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
        for (GenericRecord datum : streamReader) {
            writer.write(datum, encoder);
        }
        encoder.flush();
        output.flush();
        return new String(output.toByteArray());
    } finally {
        try { if (output != null) output.close(); } catch (Exception e) { }
    }
}

Answer 2

回答by ppearcy

To add to Keegan's answer, this discussion is likely useful:

为了补充 Keegan 的回答，这个讨论可能有用：

http://mail-archives.apache.org/mod_mbox/avro-user/201209.mbox/%3CCALEq1Z8s1sfaAVB7YE2rpZ=v3q1V_h7Vm39h0HsOzxJ+qfQRSg@mail.gmail.com%3E

The gist is that there is a special Json schema and you can use JsonReader/Writer to get to and from that. The Json schema you should use is defined here:

要点是有一个特殊的 Json 模式，您可以使用 JsonReader/Writer 来访问和访问它。您应该使用的 Json 架构在此处定义：

https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc

Answer 3

回答by StrongYoung

You can use the avro-toolsto convert the json file({input_file}.json) to avro file({output_file}.avro) when you know the schema({schema_file}.avsc) of the json file. Just like below:

当您知道json 文件的 schema( ) 时，您可以使用avro-tools将 json file( {input_file}.json)转换为 avro file( {output_file}.avro) {schema_file}.avsc。就像下面一样：

java -jar the/path/of/avro-tools-1.8.1.jar fromjson {input_file}.json   --schema-file {schema_file}.avsc > {output_file}.avro

By the way, the contents of {schema_file}.avscfile is as belows:

顺便说一下，{schema_file}.avsc文件的内容如下：

{"type": "record",
 "name": "User",
  "fields": [
      {"name": "name", "type": "string"},
      {"name": "favorite_number",  "type": ["int", "null"]},
      {"name": "favorite_color", "type": ["string", "null"]}
  ]
 }

Download avro-tools-1.8.1

下载 avro-tools-1.8.1

Download others avro-tools

下载其他 avro-tools

Java 如何使用 Apache Avro 对 JSON 字符串进行 Avro 二进制编码？

提问by AKIWEB

回答by Keegan

回答by ppearcy

回答by StrongYoung

相关推荐

最近更新

标签

Java 如何使用 Apache Avro 对 JSON 字符串进行 Avro 二进制编码？

提问by AKIWEB

回答by Keegan

回答by ppearcy

回答by StrongYoung

相关推荐

Java 在不知道xml文件结构的情况下解析xml文件内容

Java “com.mysql.jdbc.exceptions.jdbc4.CommunicationsException：通信链接失败”到远程数据库

Java (70007) 指定的超时已过期：代理：从远程服务器读取状态行时出错

Java 在 Android 的 IntentService 中等待异步回调

相关推荐

最近更新

标签