如何将 xml 文件加载到 Hive 中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20852166/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to load xml file into Hive
提问by backtrack
Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file
我在 Hive 表上工作时遇到以下问题。我的 HDFS 中有超过 10 亿个 xml 文件。我想要做的是,每个 xml 文件都有 4 个不同的部分。现在我想为每个 xml 文件拆分和加载每个表中的每个部分
Example :
例子 :
<?xml version='1.0' encoding='iso-8859-1'?>
<section1>
<id> 1233222 </id>
// having lot of xml tages
</section1>
<section2>
// having lot of xml tages
</section2>
<section3>
// having lot of xml tages
</section3>
<section4>
// having lot of xml tages
</section4>
</xml>
And i have the four tables
我有四张桌子
section1Table
id section1 // fields
section2Table
id section2
section3Table
id section3
section4Table
id section4
Now i want to split and load the data into each table.
现在我想将数据拆分并加载到每个表中。
How can i achieve this . Can anyone help me
我怎样才能做到这一点。谁能帮我
Thanks
谢谢
UPDATE
更新
I have tried the following
我已经尝试了以下
CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';\
SELECT xpath (name, '//section1') FROM test LIMIT 1 ;
but i got the following error
但我收到以下错误
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}
回答by Vidya
You have several options:
您有多种选择:
- Load the XML into a Hive table with a string column, one per row (e.g.
CREATE TABLE xmlfiles (id int, xmlfile string). Then use an XPath UDFto do work on the XML. - Since you know the XPath's of what you want (e.g.
//section1), follow the instructions in the second half of this tutorialto ingest directly into Hive via XPath. - Map your XML to Avro as described herebecause a SerDeexists for seamless Avro-to-Hive mapping.
- Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.
- 将 XML 加载到带有字符串列的 Hive 表中,每行一个(例如
CREATE TABLE xmlfiles (id int, xmlfile string)。然后使用XPath UDF处理 XML。 - 由于您知道所需的 XPath(例如
//section1),请按照本教程后半部分中的说明通过 XPath 直接摄取到 Hive 中。 - 按照此处所述将您的 XML 映射到 Avro,因为存在SerDe以实现 Avro 到 Hive 的无缝映射。
- 使用 XPath 将数据存储在 HDFS 中的常规文本文件中,然后将其摄取到 Hive 中。
It depends on your level of experience and comfort with these approaches.
这取决于您对这些方法的经验水平和舒适度。
回答by Sweety
Use this:
用这个:
CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");
And then use xpath function
然后使用 xpath 函数

