Java 如何使用 Apache Avro 对 JSON 字符串进行 Avro 二进制编码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21977704/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to Avro Binary encode the JSON String using Apache Avro?
提问by AKIWEB
I am trying to avro binary encode my JSON String. Below is my JSON String and I have created a simple method which will do the conversion but I am not sure whether the way I am doing is correct or not?
我正在尝试对我的 JSON 字符串进行 avro 二进制编码。下面是我的 JSON 字符串,我创建了一个简单的方法来进行转换,但我不确定我的做法是否正确?
public static void main(String args[]) throws Exception{
try{
Schema schema = new Parser().parse((TestExample.class.getResourceAsStream("/3233.avsc")));
String json="{"+
" \"location\" : {"+
" \"devices\":["+
" {"+
" \"did\":\"9abd09-439bcd-629a8f\","+
" \"dt\":\"browser\","+
" \"usl\":{"+
" \"pos\":{"+
" \"source\":\"GPS\","+
" \"lat\":90.0,"+
" \"long\":101.0,"+
" \"acc\":100"+
" },"+
" \"addSource\":\"LL\","+
" \"add\":["+
" {"+
" \"val\":\"2123\","+
" \"type\" : \"NUM\""+
" },"+
" {"+
" \"val\":\"Harris ST\","+
" \"type\" : \"ST\""+
" }"+
" ],"+
" \"ei\":{"+
" \"ibm\":true,"+
" \"sr\":10,"+
" \"ienz\":true,"+
" \"enz\":100,"+
" \"enr\":10"+
" },"+
" \"lm\":1390598086120"+
" }"+
" }"+
" ],"+
" \"ver\" : \"1.0\""+
" }"+
"}";
byte[] avroByteArray = fromJsonToAvro(json,schema);
} catch (Exception ex) {
// log an exception
}
Below method will convert my JSON String to Avro Binary encoded -
下面的方法将我的 JSON 字符串转换为 Avro 二进制编码 -
private static byte[] fromJsonToAvro(String json, Schema schema) throws Exception {
InputStream input = new ByteArrayInputStream(json.getBytes());
DataInputStream din = new DataInputStream(input);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
Object datum = reader.read(null, decoder);
GenericDatumWriter<Object> w = new GenericDatumWriter<Object>(schema);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
w.write(datum, e);
e.flush();
return outputStream.toByteArray();
}
Can anyone take a look and let me know whether the way I am trying to avro binary my JSON String is correct or not?
任何人都可以看看并让我知道我尝试 avro 二进制我的 JSON 字符串的方式是否正确?
回答by Keegan
I think OP is correct. This will write Avro records themselves without the schema that would be present if this were an Avro data file.
我认为 OP 是正确的。如果这是一个 Avro 数据文件,这将自己写入 Avro 记录,而不会存在架构。
Here's a couple examples within Avro itself (useful if you are working with files.
• From JSON to Avro: DataFileWriteTool
• From Avro to JSON: DataFileReadTool
以下是 Avro 本身的几个示例(如果您正在处理文件,则很有用。
• 从 JSON 到 Avro:DataFileWriteTool
• 从 Avro 到 JSON:DataFileReadTool
Here's a complete example going both ways.
这是一个双向的完整示例。
@Grapes([
@Grab(group='org.apache.avro', module='avro', version='1.7.7')
])
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;
String schema = '''{
"type":"record",
"namespace":"foo",
"name":"Person",
"fields":[
{
"name":"name",
"type":"string"
},
{
"name":"age",
"type":"int"
}
]
}'''
String json = "{" +
"\"name\":\"Frank\"," +
"\"age\":47" +
"}"
assert avroToJson(jsonToAvro(json, schema), schema) == json
public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
InputStream input = null;
GenericDatumWriter<GenericRecord> writer = null;
Encoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
input = new ByteArrayInputStream(json.getBytes());
output = new ByteArrayOutputStream();
DataInputStream din = new DataInputStream(input);
writer = new GenericDatumWriter<GenericRecord>(schema);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
encoder = EncoderFactory.get().binaryEncoder(output, null);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.write(datum, encoder);
}
encoder.flush();
return output.toByteArray();
} finally {
try { input.close(); } catch (Exception e) { }
}
}
public static String avroToJson(byte[] avro, String schemaStr) throws IOException {
boolean pretty = false;
GenericDatumReader<GenericRecord> reader = null;
JsonEncoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
reader = new GenericDatumReader<GenericRecord>(schema);
InputStream input = new ByteArrayInputStream(avro);
output = new ByteArrayOutputStream();
DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.write(datum, encoder);
}
encoder.flush();
output.flush();
return new String(output.toByteArray());
} finally {
try { if (output != null) output.close(); } catch (Exception e) { }
}
}
For the sake of completeness, here's an example if you were working with streams (Avro calls these container files) instead of records. Note that when you go back from JSON to Avro, you don't need to pass the schema. This is because it is present in the stream.
为完整起见,这里有一个示例,如果您使用的是流(Avro 称这些容器文件)而不是记录。请注意,当您从 JSON 返回到 Avro 时,您不需要传递架构。这是因为它存在于流中。
@Grapes([
@Grab(group='org.apache.avro', module='avro', version='1.7.7')
])
// writes Avro as a http://avro.apache.org/docs/current/spec.html#Object+Container+Files rather than a sequence of records
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileStream;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;
String schema = '''{
"type":"record",
"namespace":"foo",
"name":"Person",
"fields":[
{
"name":"name",
"type":"string"
},
{
"name":"age",
"type":"int"
}
]
}'''
String json = "{" +
"\"name\":\"Frank\"," +
"\"age\":47" +
"}"
assert avroToJson(jsonToAvro(json, schema)) == json
public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
InputStream input = null;
DataFileWriter<GenericRecord> writer = null;
Encoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
input = new ByteArrayInputStream(json.getBytes());
output = new ByteArrayOutputStream();
DataInputStream din = new DataInputStream(input);
writer = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>());
writer.create(schema, output);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.append(datum);
}
writer.flush();
return output.toByteArray();
} finally {
try { input.close(); } catch (Exception e) { }
}
}
public static String avroToJson(byte[] avro) throws IOException {
boolean pretty = false;
GenericDatumReader<GenericRecord> reader = null;
JsonEncoder encoder = null;
ByteArrayOutputStream output = null;
try {
reader = new GenericDatumReader<GenericRecord>();
InputStream input = new ByteArrayInputStream(avro);
DataFileStream<GenericRecord> streamReader = new DataFileStream<GenericRecord>(input, reader);
output = new ByteArrayOutputStream();
Schema schema = streamReader.getSchema();
DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
for (GenericRecord datum : streamReader) {
writer.write(datum, encoder);
}
encoder.flush();
output.flush();
return new String(output.toByteArray());
} finally {
try { if (output != null) output.close(); } catch (Exception e) { }
}
}
回答by ppearcy
To add to Keegan's answer, this discussion is likely useful:
为了补充 Keegan 的回答,这个讨论可能有用:
The gist is that there is a special Json schema and you can use JsonReader/Writer to get to and from that. The Json schema you should use is defined here:
要点是有一个特殊的 Json 模式,您可以使用 JsonReader/Writer 来访问和访问它。您应该使用的 Json 架构在此处定义:
https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc
https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc
回答by StrongYoung
You can use the avro-tools
to convert the json file({input_file}.json
) to avro file({output_file}.avro
) when you know the schema({schema_file}.avsc
) of the json file. Just like below:
当您知道json 文件的 schema( ) 时,您可以使用avro-tools
将 json file( {input_file}.json
)转换为 avro file( {output_file}.avro
) {schema_file}.avsc
。就像下面一样:
java -jar the/path/of/avro-tools-1.8.1.jar fromjson {input_file}.json --schema-file {schema_file}.avsc > {output_file}.avro
By the way, the contents of {schema_file}.avsc
file is as belows:
顺便说一下,{schema_file}.avsc
文件的内容如下:
{"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}