pandas 带有 to_sql() 、SQLAlchemy 和 exasol 模式的 python 熊猫

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25116158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:19:36  来源:igfitidea点击:

python pandas with to_sql() , SQLAlchemy and schema in exasol

pythonsqlpandassqlalchemy

提问by Tecaz

I'm trying to upload a pandas data frame to an SQL table. It seemed to me that pandas to_sql function is the best solution for larger data frames, but I can't get it to work. I can easily extract data, but get an error message when trying to write it to a new table:

我正在尝试将 Pandas 数据框上传到 SQL 表。在我看来,pandas to_sql 函数是较大数据框的最佳解决方案,但我无法让它工作。我可以轻松提取数据,但在尝试将其写入新表时收到错误消息:

# connect to Exasol DB
exaString='DSN=exa'
conDB = pyodbc.connect(exaString)   

# get some data from somewhere, works without error
sqlString = "SELECT * FROM SOMETABLE"
data = pd.read_sql(sqlString, conDB)

# now upload this data to a new table
data.to_sql('MYTABLENAME', conDB, flavor='mysql')

conDB.close()

The error message I get is

我得到的错误信息是

pyodbc.ProgrammingError: ('42000', "[42000] [EXASOL][EXASolution driver]syntax error, unexpected identifier_chain2, expecting assignment_operator or ':' [line 1, column 6] (-1) (SQLExecDirectW)")

pyodbc.ProgrammingError: ('42000', "[42000] [EXASOL][EXASolution driver] 语法错误,意外的 identifier_chain2,期待 assignment_operator 或 ':' [第 1 行,第 6 列] (-1) (SQLExecDirectW)")

Unfortunately I have no idea how the query that caused this syntax error looks like or what else is wrong. Can someone please point me in the right direction?

不幸的是,我不知道导致此语法错误的查询是什么样的,或者还有什么问题。有人可以指出我正确的方向吗?

(Second) EDIT:

(第二)编辑:

Following Humayuns and Joris suggestions, I now use Pandas version 0.14 and SQLAlchemy in combination with the Exasol dialect (?). Since I am connecting to a defined schema, I am using the meta data option, but the programm crashes with "Bus error (core dumped)".

按照 Humayuns 和 Joris 的建议,我现在将 Pandas 0.14 版和 SQLAlchemy 与 Exasol 方言 (?) 结合使用。由于我连接到定义的模式,因此我使用元数据选项,但程序因“总线错误(核心转储)”而崩溃。

engine = create_engine('exa+pyodbc://uid:passwd@exa/mySchemaName', echo=True)    

# get some data
sqlString = "SELECT * FROM SOMETABLE"    # SOMETABLE is a view in mySchemaName 
df = pd.read_sql(sqlString, con=engine)  # works

print engine.has_table('MYTABLENAME')    # MYTABLENAME is a view in mySchemaName
# prints "True"

# upload it to a new table
meta = sqlalchemy.MetaData(engine, schema='mySchemaName')
meta.reflect(engine, schema='mySchemaName')
pdsql = sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'MYTABLENAME')

I am not sure about setting "mySchemaName" in create_engine(..), but the outcome is the same.

我不确定在 create_engine(..) 中设置“mySchemaName”,但结果是一样的。

回答by OK2

Pandas does not support the EXASOL syntax out of the box, so it need to be changed a bit, here is a working example of your code without SQLAlchemy:

Pandas 不支持开箱即用的 EXASOL 语法,因此需要对其进行一些更改,以下是没有 SQLAlchemy 的代码的工作示例:

import pyodbc
import pandas as pd

con = pyodbc.connect('DSN=EXA')
con.execute('OPEN SCHEMA TEST2')

# configure pandas to understand EXASOL as mysql flavor
pd.io.sql._SQL_TYPES['int']['mysql'] = 'INT'
pd.io.sql._SQL_SYMB['mysql']['br_l'] = ''
pd.io.sql._SQL_SYMB['mysql']['br_r'] = ''
pd.io.sql._SQL_SYMB['mysql']['wld'] = '?'
pd.io.sql.PandasSQLLegacy.has_table = \
    lambda self, name: name.upper() in [t[0].upper() for t in con.execute('SELECT table_name FROM cat').fetchall()]

data = pd.read_sql('SELECT * FROM services', con)
data.to_sql('SERVICES2', con, flavor = 'mysql', index = False)

If you use the EXASolution Python package, then the code would look like follows:

如果您使用 EXASolution Python 包,则代码如下所示:

import exasol
con = exasol.connect(dsn='EXA') # normal pyodbc connection with additional functions
con.execute('OPEN SCHEMA TEST2')

data = con.readData('SELECT * FROM services') # pandas data frame per default
con.writeData(data, table = 'services2')

回答by Joerg

The problem is that also in pandas 0.14 the read_sql and to_sql functions cannot deal with schemas, but using exasol without schemas makes no sense. This will be fixed in 0.15. If you want to use it now look at this pull request https://github.com/pydata/pandas/pull/7952

问题是在 Pandas 0.14 中, read_sql 和 to_sql 函数也不能处理模式,但是使用没有模式的 exasol 是没有意义的。这将在 0.15 中修复。如果您现在想使用它,请查看此拉取请求https://github.com/pydata/pandas/pull/7952