如何修复预期的启动联合。在命令行上将 JSON 转换为 Avro 时得到 VALUE_NUMBER_INT?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27485580/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:40:49  来源:igfitidea点击:

How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?

jsonvalidationavro

提问by Emre Sevin?

I'm trying to validate a JSON file using an Avro schema and write the corresponding Avro file. First, I've defined the following Avro schema named user.avsc:

我正在尝试使用 Avro 架构验证 JSON 文件并编写相应的 Avro 文件。首先,我定义了以下名为 的 Avro 架构user.avsc

{"namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

Then created a user.jsonfile:

然后创建了一个user.json文件:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}

And then tried to run:

然后尝试运行:

java -jar ~/bin/avro-tools-1.7.7.jar fromjson --schema-file user.avsc user.json > user.avro

But I get the following exception:

但我得到以下异常:

Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)

Am I missing something? Why do I get "Expected start-union. Got VALUE_NUMBER_INT".

我错过了什么吗?为什么我会收到“预期的启动联合。有 VALUE_NUMBER_INT”。

回答by Emre Sevin?

According to the explanation by Doug Cutting,

根据道格切割的解释

Avro's JSON encoding requires that non-null union values be tagged with their intended type. This is because unions like ["bytes","string"] and ["int","long"] are ambiguous in JSON, the first are both encoded as JSON strings, while the second are both encoded as JSON numbers.

Avro 的 JSON 编码要求非空联合值使用其预期类型进行标记。这是因为像 ["bytes","string"] 和 ["int","long"] 这样的联合在 JSON 中是不明确的,第一个都编码为 JSON 字符串,而第二个都编码为 JSON 数字。

http://avro.apache.org/docs/current/spec.html#json_encoding

http://avro.apache.org/docs/current/spec.html#json_encoding

Thus your record must be encoded as:

因此,您的记录必须编码为:

{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null}

回答by ppearcy

There is a new JSON encoder in the works that should address this common issue:

有一个新的 JSON 编码器正在开发中,可以解决这个常见问题:

https://issues.apache.org/jira/browse/AVRO-1582

https://issues.apache.org/jira/browse/AVRO-1582

https://github.com/zolyfarkas/avro

https://github.com/zolyfarkas/avro

回答by Tanmay Naik

I have implemented union and its validation , just create a union schema and pass its values through postman . resgistry url is the url which you specify for properties of kafka , u also can pass dynamic values to your schema

我已经实现了 union 及其验证,只需创建一个 union 模式并通过 postman 传递它的值。resgistry url 是您为 kafka 的属性指定的 url,您也可以将动态值传递给您的架构

RestTemplate template = new RestTemplate();
        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        HttpEntity<String> entity = new HttpEntity<String>(headers);
        ResponseEntity<String> response = template.exchange(""+registryUrl+"/subjects/"+topic+"/versions/"+version+"", HttpMethod.GET, entity, String.class);
        String responseData = response.getBody();
        JSONObject jsonObject = new JSONObject(responseData);
        JSONObject jsonObjectResult = new JSONObject(jsonResult);
        String getData = jsonObject.get("schema").toString();
        Schema.Parser parser = new Schema.Parser();
        Schema schema = parser.parse(getData);
        GenericRecord genericRecord = new GenericData.Record(schema);
        schema.getFields().stream().forEach(field->{
            genericRecord.put(field.name(),jsonObjectResult.get(field.name()));
        });
        GenericDatumReader<GenericRecord>reader = new GenericDatumReader<GenericRecord>(schema);
        boolean data = reader.getData().validate(schema,genericRecord );

回答by Abhinandan Dubey

As @Emre-Sevinc has pointed out, the issue is with the encoding of your Avro record.

正如@Emre-Sevinc 指出的那样,问题在于您的 Avro 记录的编码。

To be more specific here;

在这里更具体地说;

Don't do this:

不要这样做:

   jsonRecord = avroGenericRecord.toString

Instead, do this:

相反,请执行以下操作:

    val writer = new GenericDatumWriter[GenericRecord](avroSchema)
    val baos = new ByteArrayOutputStream
    val jsonEncoder = EncoderFactory.get.jsonEncoder(avroSchema, baos)
    writer.write(avroGenericRecord, jsonEncoder)
    jsonEncoder.flush

    val jsonRecord = baos.toString("UTF-8")

You'll also need following imports:

您还需要以下导入:

import org.apache.avro.Schema
import org.apache.avro.generic.{GenericData, GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.io.{DecoderFactory, EncoderFactory}

After you do this, you'll get jsonRecordwith non-null union values tagged with their intended type.

执行此操作后,您将获得jsonRecord标有其预期类型的​​非空联合值。

Hope this helps !

希望这可以帮助 !