使用 Teradata 模块将 Python 与 Teradata 连接起来

Question

提问by anurag

I have installed python 2.7.0 and Teradata module on Windows 7. I am not able to connect and query TD from python.

我已经在 Windows 7 上安装了 python 2.7.0 和 Teradata 模块。我无法从 python 连接和查询 TD。

pip install Teradata

Now I want to import teradata module in my source code and perform operations like -

现在我想在我的源代码中导入 teradata 模块并执行如下操作 -

Firing queries to teradata and get result set.
Check if connection is made to teradata.

向 teradata 发出查询并获取结果集。
检查是否已连接到 Teradata。

Please help me writing code for the same as I am new to Python and there is no information available with me to connect to teradata.

请帮助我编写代码，因为我是 Python 新手，我没有可用的信息来连接到 teradata。

Answer 1

回答by Prayson W. Daniel

There are a number of ways to connect to Teradata and export table to Pandas. Here are four+:

有多种方法可以连接到 Teradata 并将表导出到 Pandas。这里有四个+：

Using teradata module

使用teradata模块

# You can install teradata via PIP: pip install teradata
# to get a list of your odbc drivers names, you could do: teradata.tdodbc.drivers
# You don't need to install teradata odbc driver if using method='rest'.     
# See sending data from df to teradata for connection example 

import teradata
import pandas as pd

host,username,password = 'HOST','UID', 'PWD'
#Make a connection
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)


with udaExec.connect(method="odbc",system=host, username=username,
                            password=password, driver="DRIVERNAME") as connect:

    query = "SELECT * FROM DATABASEX.TABLENAMEX;"

    #Reading query to df
    df = pd.read_sql(query,connect)
    # do something with df,e.g.
    print(df.head()) #to see the first 5 rows

Using TeradataSQL

使用 TeradataSQL

from @ymzkala : This package doesn't require you to install Teradata drivers (other than this package).

来自@ymzkala ：此软件包不需要您安装 Teradata 驱动程序（此软件包除外）。

# Installing python -m pip install teradatasql

import teradatasql

with teradatasql.connect(host='host', user='username', password='password') as connect:
    df = pd.read_sql(query, connect)

Using pyodbc module

使用 pyodbc 模块

import pyodbc

 #You can install teradata via PIP: pip install pyodbc
 #to get a list of your odbc drivers names, you could do: pyodbc.drivers()

#Make a connection
link = 'DRIVER={DRIVERNAME};DBCNAME={hostname};UID={uid};PWD={pwd}'.format(
                      DRIVERNAME=DRIVERNAME,hostname=hostname,  
                      uid=username, pwd=password)
with pyodbc.connect(link,autocommit=True) as connect:

    #Reading query to df
    df = pd.read_sql(query,connect)

Using sqlalchemy Module

使用sqlalchemy 模块

 #You can install sqlalchemy via PIP: pip install sqlalchemy-teradata
 #Note: It is not pip install sqlalchemy. If you already have sqlalchemy, you still need sqlalchemy-teradata to get teradata dialects

from sqlalchemy import create_engine

#Make a connection

link = 'teradata://{username}:{password}@{hostname}/?driver={DRIVERNAME}'.format(
               username=username,hostname=hostname,DRIVERNAME=DRIVERNAME)

with create_engine(link) as connect:

    #Reading query to df
    df = pd.read_sql(query,connect)

There is a fifth way, using giraffez module. I enjoy using this module as it come with MLOAD, FASTLOAD, BULKEXPORT etc. The only issue for beginners is its requirements (e.g C/C++ compiler ,Teradata CLIv2 and TPT API headers/lib files).

还有第五种方式，使用giraffez 模块。我喜欢使用这个模块，因为它带有 MLOAD、FASTLOAD、BULKEXPORT 等。初学者的唯一问题是它的要求（例如 C/C++ 编译器、Teradata CLIv2 和 TPT API 头文件/lib 文件）。

Note: Updated 13-07-2018, using of context manager to ensure closing of sessions

注意：2018 年 7 月 13 日更新，使用上下文管理器确保关闭会话

Update: 31-10-2018: Using teradata to send data from df to teradata

更新：31-10-2018：使用 teradata 将数据从 df 发送到 teradata

We can send data from df to Teradata. Avoiding 'odbc' 1 MB limit and odbc driver dependency, we can use 'rest' method. We need host ip_address, instead of driver argument. NB:The order of columns in df should match the order of columns in Teradata table.

我们可以将数据从 df 发送到 Teradata。避免 'odbc' 1 MB 限制和 odbc 驱动程序依赖性，我们可以使用 'rest' 方法。我们需要主机 ip_address，而不是驱动程序参数。注意：df 中的列顺序应与 Teradata 表中的列顺序匹配。

import teradata
import pandas as pd

# HOST_IP can be found by executing *>>nslookup viewpoint* or *ping  viewpoint* 
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False) 
with udaExec.connect(method="rest",system="DBName", username="UserName",
                      password="Password", host="HOST_IP_ADDRESS") as connect:

    data = [tuple(x) for x in df.to_records(index=False)]

    connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True)

Using 'odbc', you have to chunk your data to less than 1MB chunks to avoid "[HY001][Teradata][ODBC Teradata Driver] Memory allocation error" error: E.g.

使用 'odbc'，您必须将数据分块到小于 1MB 的块以避免“[HY001][Teradata][ODBC Teradata Driver] 内存分配错误”错误：例如

import teradata
import pandas as pd
import numpy as np

udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)

with udaExec.connect(method="odbc",system="DBName", username="UserName",
                      password="Password", driver="DriverName") as connect:

    #We can divide our huge_df to small chuncks. E.g. 100 churchs
    chunks_df = np.array_split(huge_df, 100)

    #Import chuncks to Teradata
    for i,_ in enumerate(chunks_df):

        data = [tuple(x) for x in chuncks_df[i].to_records(index=False)]
        connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True)

Answer 2

回答by ymzkala

To add on to Prayson'sanswer, you can use the teradatasql package (found on pypi). This package doesn't require you to install Teradata drivers (other than this package). Use it like so:

要添加Prayson 的答案，您可以使用teradatasql包（在 pypi 上找到）。此软件包不需要您安装 Teradata 驱动程序（此软件包除外）。像这样使用它：

import teradatasql
import pandas as pd

with teradatasql.connect(host='host', user='username', password='password') as connect:
    data = pd.read_sql('select top 5 * from table_name;', connect)

Answer 3

回答by Chaitanya Pavan Kumar Kothamas

Download the Teradata Python module and python pyodbc.pyd from internet. Install using cmd install setup.py.

从 Internet 下载 Teradata Python 模块和 python pyodbc.pyd。使用 cmd install setup.py 安装。

Here is the sample script for connecting to teradata and extracting data:

以下是用于连接到 teradata 并提取数据的示例脚本：

import teradata
import pyodbc
import sys



udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0",
        logConsole=False)

session = udaExec.connect(method="odbc", dsn="prod32",
        username="PRODRUN", password="PRODRUN");

i = 0
REJECTED = 'R';

f = file("output.txt","w");sys.stdout=f

cursor =  session.cursor();

ff_remaining = 0;

cnt = cursor.execute("SELECT  SEQ_NO,FRQFBKDC,PNR_RELOC FROM ttemp.ffremaining ORDER BY 1,2,3 ").rowcount;
rows = cursor.execute("SELECT  SEQ_NO,FRQFBKDC,PNR_RELOC FROM ttemp.ffremaining ORDER BY 1,2,3 ").fetchall();


for i in range(cnt):
    ff_remaining = cursor.execute("select count(*) as coun from  ttemp.ffretroq_paxoff where seq_no=? and status <> ?",(rows[i].seq_no,REJECTED)).fetchall();
    print ff_remaining[0].coun, rows[i].seq_no, REJECTED;