pandas.read_sql_query() 如何查询 TEMP 表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26286615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:33:49  来源:igfitidea点击:

How can pandas.read_sql_query() query a TEMP table?

sqlpython-2.7pandassqlalchemynetezza

提问by DavidJ

I'm in the process of converting Python code over to the new SQLAlchemy-based Pandas 0.14.1.

我正在将 Python 代码转换为新的基于 SQLAlchemy 的 Pandas 0.14.1。

A common pattern we use is (generically):

我们使用的一个常见模式是(一般):

connection = db.connect()  # open connection/session

sql = 'CREATE TEMP TABLE table1 AS SELECT ...'
connection.execute(sql)

... other sql that creates TEMP tables from various joins of previous TEMP tables ...

sql = 'CREATE TEMP TABLE tableN AS SELECT ...'
connection.execute(sql)

result = connection.query('SELECT * FROM tableN WHERE ...')

connection.close()

Now, once the connection is closed the TEMP tables are purged by the DB server. However, as the final select query is using the same connection/session, it can access the tables.

现在,一旦连接关闭,数据库服务器就会清除 TEMP 表。但是,由于最终选择查询使用相同的连接/会话,因此它可以访问表。

How can I achieve similar using SQLAlchemy and pd.read_sql_query() ?

如何使用 SQLAlchemy 和 pd.read_sql_query() 实现类似的目标?

For example:

例如:

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql_query('SELECT * FROM tmptable WHERE ...', engine)

yields a DB error that the TEMP table tmptable doesn't exist. Presumably this is because passing the engine to the read_sql_query() requires it to open a new connection which has an independent session scope and hence can't see the TEMP table. Is that a reasonable assumption?

产生 TEMP 表 tmptable 不存在的数据库错误。大概这是因为将引擎传递给 read_sql_query() 需要它打开一个具有独立会话范围的新连接,因此无法看到 TEMP 表。这是一个合理的假设吗?

Is there a way to work around that? (passing the connection to read_sql_query() isn't supported)

有没有办法解决这个问题?(不支持将连接传递给 read_sql_query())

(I know that I can concatenate the SQL into a single string with ; separating the statements, but this is a simplification of the actual situation where the TEMP tables are created by a multitude of functions which call others nesting 3-4 deep. So, to achieve that would require implementing a layer than can coalesce the SQL across multiple calls before issuing it, which I'd rather avoid implementing if there is a nicer way)

(我知道我可以用 ; 将 SQL 连接成一个字符串,分隔语句,但这是实际情况的简化,其中 TEMP 表是由大量函数创建的,这些函数调用其他嵌套 3-4 深。所以,要实现这一点,需要实现一个层,而不是在发出之前将 SQL 合并到多个调用中,如果有更好的方法,我宁愿避免实现)

Using -
Pandas: 0.14.1
sqlalchemy: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python distribution (Python 2.7.6)
Josh Kuhn's Netezza SQLAlchemy dialect from https://github.com/deontologician/netezza_sqlalchemy

使用 -
Pandas: 0.14.1 sqlalchemy
: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python 发行版 (Python 2.7.6)
Josh Kuhn 的 Netezza SQLAlchemy 方言来自https://github.com/deontologician/netezza_sqlalchemy

采纳答案by ssharma

You can now pass SQLAlchemy connectable to pandas.read_sql. From the docs:

您现在可以将 SQLAlchemy 连接到pandas.read_sql. 从文档

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

...

con : SQLAlchemy connectable (engine/connection) or database string URI

or DBAPI2 connection (fallback mode)

Using SQLAlchemy makes it possible to use any DB supported by that > library. If a DBAPI2 object, only sqlite3 is supported.

pandas.read_sql(sql,con,index_col=None,coerce_float=True,params=None,parse_dates=None,columns=None,chunksize=None)

...

con : SQLAlchemy 可连接(引擎/连接)或数据库字符串 URI

或 DBAPI2 连接(回退模式)

使用 SQLAlchemy 可以使用该 > 库支持的任何数据库。如果是 DBAPI2 对象,则仅支持 sqlite3。

So, this should work:

所以,这应该有效:

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=connection)

回答by Luis A.G.

As @ssharma says, you can now pass SQLAlchemy connectable to pandas.read_sql. If you create the session with the session maker, you need the connection object.

正如@ssharma 所说,您现在可以将可连接的 SQLAlchemy 传递给 pandas.read_sql。如果您使用会话创建者创建会话,则需要连接对象。

For reading the uncommited changes you have to use the same connection like this:

要读取未提交的更改,您必须使用相同的连接,如下所示:

engine = sqlalchemy.create_engine('netezza://@mydsn')
session = sessionmaker(bind=self.engine)()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
session.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=session.connection())

回答by Carlos Chaccon

All you need to do is add 'SET NOCOUNT ON' at the beginning of your query, that way pandas read_sql will read everything as one statement.

您需要做的就是在查询的开头添加“SET NOCOUNT ON”,这样pandas read_sql 会将所有内容作为一个语句读取。

sSQL = '''SET NOCOUNT ON
CREATE TABLE ...... '''

回答by Andrew

You are using Python and Netezza, I was using R and SQL Server, so this might be different. In my script, I ran into a similar issue. sp_execute_external_scriptin T-SQL which allows for external code to run on the database only allows for selectstatements. This was burdensome for me because I wanted to run a stored procedure to create a temp table to select from. Alternatively, I could use common table expressions, unions, etc. It might be worth further investigation.

您使用的是 Python 和 Netezza,我使用的是 R 和 SQL Server,所以这可能会有所不同。在我的脚本中,我遇到了类似的问题。 sp_execute_external_script在允许外部代码在数据库上运行的 T-SQL 中,只允许select语句。这对我来说很麻烦,因为我想运行一个存储过程来创建一个临时表以供选择。或者,我可以使用公共表表达式、联合等。这可能值得进一步研究。

回答by melihozbek

I understand the issue, but is creating regular tables not working? You could come up with a convention such as CREATE TABLE TEMP_t1' etc., andDROP` them at the end of your session.

我理解这个问题,但是创建常规表不起作用吗?您可以CREATE TABLE TEMP_t1' etc., and在会话结束时提出诸如DROP 之类的约定。