如何使用 Hive (get_json_object) 或 json serde 查询结构数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45020211/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to query struct array with Hive (get_json_object) or json serde
提问by DatWunGuy102
I am trying to query the following JSON example file stored on my HDFS
我正在尝试查询存储在我的 HDFS 上的以下 JSON 示例文件
{
"tag1": "1.0",
"tag2": "blah",
"tag3": "blahblah",
"tag4": {
"tag4_1": [{
"tag4_1_1": [{
"tag4_1_1_1": {
"Addr": {
"Addr1": "blah",
"City": "City",
"StateProvCd": "NY",
"PostalCode": "99999"
}
}
"tag4_1_1_1": {
"Addr": {
"Addr1": "blah2",
"City": "City2",
"StateProvCd": "NY",
"PostalCode": "99999"
}
}
}
]
}
]
}
}
I used the following to create an external table over the data
我使用以下内容在数据上创建外部表
CREATE EXTERNAL TABLE DB.hv_table
(
tag1 string
, tag2 string
, tag3 string
, tag4 struct<tag4_1:ARRAY<struct<tag4_1_1:ARRAY<struct<tag4_1_1_1:struct<Addr
Addr1:string
, City:string
, StateProvCd:string
, PostalCode:string>>>>>>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 'HDFS/location';
Ideally, I want to query the data such that it would return to me as such:
理想情况下,我想查询数据,使其返回给我:
select tag1, tag2, tag3, tag4(all data) from DB.hv_table;
Can someone provide me an example of how I can query without writing it in the following manner:
有人可以为我提供一个示例,说明如何在不以以下方式编写的情况下进行查询:
select tag1, tag2, tag3
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.Addr1 as Addr1
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.City as City
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.StateProvCd as StateProvCd
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.PostalCode as PostalCode
from DB.hv_table
Most importantly, I would like to not define the array item element number. In my example, I am only able to target the first element of my array (tag4_1_1_1). I would to target everything if possible.
最重要的是,我不想定义数组项元素编号。在我的示例中,我只能定位数组的第一个元素 (tag4_1_1_1)。如果可能的话,我会瞄准一切。
回答by DatWunGuy102
Found a really good blog at: ThornyDev
CREATE EXTERNAL TABLE IF NOT EXISTS DB.dummyTable (jsonBlob STRING)
LOCATION 'pathOfYourFiles';
SELECT
get_json_object(jsonBlob, '$.tag1') AS tag1
,get_json_object(jsonBlob, '$.tag2') AS tag2
,get_json_object(jsonBlob, '$.tag3') AS tag3
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.Addr1') AS Addr1
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.City') AS City
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.StateProvCd') AS StateProvCd
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.PostalCode') AS PostalCode
FROM DB.dummyTable
I'm very satisfied, but I want to check out the json tuple and see how it performs versus the "get_json_object" class
我很满意,但我想查看 json 元组,看看它与“get_json_object”类相比如何

