使用 Python 访问 Hive 数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17722372/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Access Hive Data Using Python
提问by subina mohanan
I have some data in HDFS,i need to access that data using python,can anyone tell me how data is accessed from hive using python?
我在 HDFS 中有一些数据,我需要使用 python 访问这些数据,谁能告诉我如何使用 python 从 hive 访问数据?
采纳答案by Sreejith
You can use hive library for access hive from python,for that you want to import hive Class from hive import ThriftHive
您可以使用 hive 库从 python 访问 hive,因为您想从 hive import ThriftHive 导入 hive 类
Below the Example
下面的例子
import sys
from hive import ThriftHive
from hive.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
transport = TSocket.TSocket('localhost', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
client.execute("SELECT * FROM r")
while (1):
row = client.fetchOne()
if (row == None):
break
print row
client.execute("SELECT * FROM r")
print client.fetchAll()
transport.close()
except Thrift.TException, tx:
print '%s' % (tx.message)
回答by Tristan Reid
To install you'll need these libraries:
要安装,您将需要这些库:
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev
using apt-get
or yum
or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install
)
如果您使用的是 Linux,则可能需要在运行上述之前单独安装 SASL。libsasl2-dev
使用apt-get
或yum
或任何包管理器安装包。对于 Windows ,GNU.org 上有一些选项。如果您安装了 xcode 开发人员工具,则在 Mac 上应该可以使用 SASL ( xcode-select --install
)
After installation, you can execute a hive query like this:
安装后,您可以像这样执行 hive 查询:
from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
Now that you have the hive connection, you have options how to use it. You can just straight-up query:
现在您有了 hive 连接,您可以选择如何使用它。您可以直接查询:
cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
use_result(result)
...or to use the connection to make a Pandas dataframe:
...或使用连接制作 Pandas 数据框:
import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
回答by Jared Wilber
A much simpler solution if you're on Windows uses pyodbc
:
如果您使用的是 Windows,则使用更简单的解决方案pyodbc
:
import pyodbc
import pandas as pd
# connect odbc to data source name
conn = pyodbc.connect("DSN=<your_dsn>", autocommit=True)
# read data into dataframe
hive_df = pd.read_sql("SELECT * FROM <table_name>", conn)
As long as you have an ODBC driver and a DSN, that's all you need.
只要您有 ODBC 驱动程序和 DSN,这就是您所需要的。