失败，异常 java.io.IOException:org.apache.avro.AvroTypeException: Found long,期待在 hive 中的联合

Question

提问by Sijoy Joseph

Need help!!!

需要帮忙！！！

I am streaming twitter feeds into hdfs using flumeand loading it up in hivefor analysis.

我正在将 Twitter 提要流式传输到 hdfs 中，flume并将其加载以hive进行分析。

The steps are as follows:

步骤如下：

Data in hdfs:

hdfs 中的数据：

I have described the avro schemain an avscfile and put it in hadoop:

我已经avro schema在一个avsc文件中描述了它并将它放在了 hadoop 中：

 {"type":"record",
 "name":"Doc",
 "doc":"adoc",
 "fields":[{"name":"id","type":"string"},
       {"name":"user_friends_count","type":["int","null"]},
       {"name":"user_location","type":["string","null"]},
       {"name":"user_description","type":["string","null"]},
       {"name":"user_statuses_count","type":["int","null"]},
       {"name":"user_followers_count","type":["int","null"]},
       {"name":"user_name","type":["string","null"]},
       {"name":"user_screen_name","type":["string","null"]},
       {"name":"created_at","type":["string","null"]},
       {"name":"text","type":["string","null"]},
       {"name":"retweet_count","type":["boolean","null"]},
       {"name":"retweeted","type":["boolean","null"]},
       {"name":"in_reply_to_user_id","type":["long","null"]},
       {"name":"source","type":["string","null"]},
       {"name":"in_reply_to_status_id","type":["long","null"]},
       {"name":"media_url_https","type":["string","null"]},
       {"name":"expanded_url","type":["string","null"]}]}

I have written an .hql file to create a table and loaded data in it:

我编写了一个 .hql 文件来创建一个表并在其中加载数据：

 create table tweetsavro
    row format serde
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    stored as inputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    outputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');

    load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;

I have successfully run the .hql file but when i run the select *from <tablename>command in hive it shows the following error:

我已成功运行 .hql 文件，但是当我select *from <tablename>在 hive 中运行该命令时，它显示以下错误：

error

错误

The output of tweetsavro is:

tweetsavro 的输出是：

hive> desc tweetsavro;
OK
id                      string                                      
user_friends_count      int                                         
user_location           string                                      
user_description        string                                      
user_statuses_count     int                                         
user_followers_count    int                                         
user_name               string                                      
user_screen_name        string                                      
created_at              string                                      
text                    string                                      
retweet_count           boolean                                     
retweeted               boolean                                     
in_reply_to_user_id     bigint                                      
source                  string                                      
in_reply_to_status_id   bigint                                      
media_url_https         string                                      
expanded_url            string                                      
Time taken: 0.697 seconds, Fetched: 17 row(s)

Answer 1

回答by Dhirendra Khanka

I was facing the exact same issue. The issue existed in the timestamp field("created_at" column in your case) which i was trying to insert as string into my new table. My assumption was this data would be in [ "null","string"]format in my source. I analyzed the source avro schema which got generated from the sqoop import --as-avrodatafile process. The avro schema generated from import had the below signature for the timestamp column.
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },

我面临着完全相同的问题。该问题存在于时间戳字段（在您的案例中为“created_at”列）中，我试图将其作为字符串插入到我的新表中。我的假设是这些数据将[ "null","string"]在我的来源中采用格式。我分析了从 sqoop import --as-avrodatafile 进程生成的源 avro 模式。从导入生成的 avro 模式的时间戳列具有以下签名。
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },

SqlType 93 stands for Timestamp datatype. So in my target table Avro Schema file I changed the data type to 'long' and this solved the issue. My guess is possibly the mismatch of datatype in one of your columns.

SqlType 93 代表时间戳数据类型。所以在我的目标表 Avro Schema 文件中，我将数据类型更改为“long”，这解决了问题。我的猜测可能是您的一列中的数据类型不匹配。

失败，异常 java.io.IOException:org.apache.avro.AvroTypeException: Found long,期待在 hive 中的联合

提问by Sijoy Joseph

回答by Dhirendra Khanka

相关推荐

最近更新

标签

失败，异常 java.io.IOException:org.apache.avro.AvroTypeException: Found long,期待在 hive 中的联合

提问by Sijoy Joseph

回答by Dhirendra Khanka

相关推荐

java FXML 如何设置选择框默认值

java 如何在Java中的一行之间留一个空格？

java 请通过您的网络浏览器登录：https://support.google.com/mail/accounts/answer/78754（失败）

java 如何设置android评级栏的笔触颜色？（不是星星的颜色而是BORDER）

相关推荐

最近更新

标签