失败,异常 java.io.IOException:org.apache.avro.AvroTypeException: Found long,期待在 hive 中的联合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35480155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 00:10:53  来源:igfitidea点击:

Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive

javahadoophive

提问by Sijoy Joseph

Need help!!!

需要帮忙!!!

I am streaming twitter feeds into hdfs using flumeand loading it up in hivefor analysis.

我正在将 Twitter 提要流式传输到 hdfs 中,flume并将其加载以hive进行分析。

The steps are as follows:

步骤如下:

Data in hdfs:

hdfs 中的数据:

I have described the avro schemain an avscfile and put it in hadoop:

我已经avro schema在一个avsc文件中描述了它并将它放在了 hadoop 中:

 {"type":"record",
 "name":"Doc",
 "doc":"adoc",
 "fields":[{"name":"id","type":"string"},
       {"name":"user_friends_count","type":["int","null"]},
       {"name":"user_location","type":["string","null"]},
       {"name":"user_description","type":["string","null"]},
       {"name":"user_statuses_count","type":["int","null"]},
       {"name":"user_followers_count","type":["int","null"]},
       {"name":"user_name","type":["string","null"]},
       {"name":"user_screen_name","type":["string","null"]},
       {"name":"created_at","type":["string","null"]},
       {"name":"text","type":["string","null"]},
       {"name":"retweet_count","type":["boolean","null"]},
       {"name":"retweeted","type":["boolean","null"]},
       {"name":"in_reply_to_user_id","type":["long","null"]},
       {"name":"source","type":["string","null"]},
       {"name":"in_reply_to_status_id","type":["long","null"]},
       {"name":"media_url_https","type":["string","null"]},
       {"name":"expanded_url","type":["string","null"]}]}

I have written an .hql file to create a table and loaded data in it:

我编写了一个 .hql 文件来创建一个表并在其中加载数据:

 create table tweetsavro
    row format serde
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    stored as inputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    outputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');

    load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;

I have successfully run the .hql file but when i run the select *from <tablename>command in hive it shows the following error:

我已成功运行 .hql 文件,但是当我select *from <tablename>在 hive 中运行该命令时,它显示以下错误:

error

错误

The output of tweetsavro is:

tweetsavro 的输出是:

hive> desc tweetsavro;
OK
id                      string                                      
user_friends_count      int                                         
user_location           string                                      
user_description        string                                      
user_statuses_count     int                                         
user_followers_count    int                                         
user_name               string                                      
user_screen_name        string                                      
created_at              string                                      
text                    string                                      
retweet_count           boolean                                     
retweeted               boolean                                     
in_reply_to_user_id     bigint                                      
source                  string                                      
in_reply_to_status_id   bigint                                      
media_url_https         string                                      
expanded_url            string                                      
Time taken: 0.697 seconds, Fetched: 17 row(s)

回答by Dhirendra Khanka

I was facing the exact same issue. The issue existed in the timestamp field("created_at" column in your case) which i was trying to insert as string into my new table. My assumption was this data would be in [ "null","string"]format in my source. I analyzed the source avro schema which got generated from the sqoop import --as-avrodatafile process. The avro schema generated from import had the below signature for the timestamp column.
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },

我面临着完全相同的问题。该问题存在于时间戳字段(在您的案例中为“created_at”列)中,我试图将其作为字符串插入到我的新表中。我的假设是这些数据将[ "null","string"]在我的来源中采用格式。我分析了从 sqoop import --as-avrodatafile 进程生成的源 avro 模式。从导入生成的 avro 模式的时间戳列具有以下签名。
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },

SqlType 93 stands for Timestamp datatype. So in my target table Avro Schema file I changed the data type to 'long' and this solved the issue. My guess is possibly the mismatch of datatype in one of your columns.

SqlType 93 代表时间戳数据类型。所以在我的目标表 Avro Schema 文件中,我将数据类型更改为“long”,这解决了问题。我的猜测可能是您的一列中的数据类型不匹配。