如何在linux中查看Hive orc文件的内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20847024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to see contents of Hive orc files in linux
提问by viper
Is there a way to see the contents of an orc file that hive 0.11 and above use. I usually cat gz files and decompress them to see the contents eg: cat part-0000.gz | pigz -d | more Note: pigz is a parallel gz program.
有没有办法查看 hive 0.11 及更高版本使用的 orc 文件的内容。我通常是 cat gz 文件并解压它们以查看内容,例如: cat part-0000.gz | 猪-d | 更多注意:pigz 是一个并行的 gz 程序。
I would like to know if there is something similar to this for orc files.
我想知道兽人文件是否有类似的东西。
采纳答案by geekyj
Updated answer in year 2020:
2020 年更新答案:
Per @Owen's answer, ORC has grown up and maturedas it's own Apache project. A completed list of ORC Adoptersshows how prevalent it is now supported across many varieties of Big Data technologies.
根据@Owen 的回答,ORC作为自己的Apache 项目已经成长和成熟。ORC 采用者的完整列表显示了它现在在多种大数据技术中的支持程度。
Credit to @Owen and the ORC Apache project team, ORC's project site has a fully maintained up-to-date documentation on using either the Javaor C++stand alone tool on ORC file stored on a Linux local file system. Which carried on the torch for the original Hive+ORC Apache wiki page.
感谢@Owen 和 ORC Apache 项目团队,ORC 的项目站点有一份完整维护的最新文档,内容涉及在 Linux 本地文件系统上存储的 ORC 文件上使用Java或C++独立工具。它为原始 Hive+ORC Apache wiki页面提供了火炬。
Original answer dated: May 30 '14 at 16:27
原始答案日期: May 30 '14 at 16:27
The ORC file dump utility comes with hive (0.11 or higher):
hive --orcfiledump <hdfs-location-of-orc-file>
ORC 文件转储实用程序随 hive(0.11 或更高版本)一起提供:
hive --orcfiledump <hdfs-location-of-orc-file>
回答by Owen O'Malley
There is now also a native executable for Linux and MacOS that prints the contents of the orc file in JSON. See the ORC project (http://orc.apache.org/) and build the C++ tools.
现在还有一个适用于 Linux 和 MacOS 的本机可执行文件,可以以 JSON 格式打印 orc 文件的内容。请参阅 ORC 项目 ( http://orc.apache.org/) 并构建 C++ 工具。
% orc-contents examples/TestOrcFile.test1.orc
There is also a native metadata tool:
还有一个原生元数据工具:
% orc-metadata ../examples/TestOrcFile.test1.orc
The ORC project also has a standalone uber jar that can do the same from Java.
ORC 项目还有一个独立的 uber jar,可以从 Java 执行相同的操作。
% java -jar orc-tools-1.2.3-uber.jar data myfile.orc
回答by Eugene
It's also capable to see the contents of a ORC file by desktop application running on Linux.
它还能够通过在 Linux 上运行的桌面应用程序查看 ORC 文件的内容。
There is a desktop application to view Parquetand also other binary format data like ORCand AVRO. It's pure Java application so that can be run at Linux, Mac and also Windows. Please check Bigdata File Viewerfor details.
有一个桌面应用程序可以查看Parquet以及其他二进制格式的数据,如ORC和AVRO。它是纯 Java 应用程序,因此可以在 Linux、Mac 和 Windows 上运行。详情请查看大数据文件查看器。
It supports complex data type like array, map, struct, etc.
它支持复杂的数据类型,如数组、映射、结构等。