pandas.read_sql_query() 如何查询 TEMP 表？

Question

提问by DavidJ

I'm in the process of converting Python code over to the new SQLAlchemy-based Pandas 0.14.1.

我正在将 Python 代码转换为新的基于 SQLAlchemy 的 Pandas 0.14.1。

A common pattern we use is (generically):

我们使用的一个常见模式是（一般）：

connection = db.connect()  # open connection/session

sql = 'CREATE TEMP TABLE table1 AS SELECT ...'
connection.execute(sql)

... other sql that creates TEMP tables from various joins of previous TEMP tables ...

sql = 'CREATE TEMP TABLE tableN AS SELECT ...'
connection.execute(sql)

result = connection.query('SELECT * FROM tableN WHERE ...')

connection.close()

Now, once the connection is closed the TEMP tables are purged by the DB server. However, as the final select query is using the same connection/session, it can access the tables.

现在，一旦连接关闭，数据库服务器就会清除 TEMP 表。但是，由于最终选择查询使用相同的连接/会话，因此它可以访问表。

How can I achieve similar using SQLAlchemy and pd.read_sql_query() ?

如何使用 SQLAlchemy 和 pd.read_sql_query() 实现类似的目标？

For example:

例如：

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql_query('SELECT * FROM tmptable WHERE ...', engine)

yields a DB error that the TEMP table tmptable doesn't exist. Presumably this is because passing the engine to the read_sql_query() requires it to open a new connection which has an independent session scope and hence can't see the TEMP table. Is that a reasonable assumption?

产生 TEMP 表 tmptable 不存在的数据库错误。大概这是因为将引擎传递给 read_sql_query() 需要它打开一个具有独立会话范围的新连接，因此无法看到 TEMP 表。这是一个合理的假设吗？

Is there a way to work around that? (passing the connection to read_sql_query() isn't supported)

有没有办法解决这个问题？（不支持将连接传递给 read_sql_query()）

(I know that I can concatenate the SQL into a single string with ; separating the statements, but this is a simplification of the actual situation where the TEMP tables are created by a multitude of functions which call others nesting 3-4 deep. So, to achieve that would require implementing a layer than can coalesce the SQL across multiple calls before issuing it, which I'd rather avoid implementing if there is a nicer way)

（我知道我可以用 ; 将 SQL 连接成一个字符串，分隔语句，但这是实际情况的简化，其中 TEMP 表是由大量函数创建的，这些函数调用其他嵌套 3-4 深。所以，要实现这一点，需要实现一个层，而不是在发出之前将 SQL 合并到多个调用中，如果有更好的方法，我宁愿避免实现）

Using -
Pandas: 0.14.1
sqlalchemy: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python distribution (Python 2.7.6)
Josh Kuhn's Netezza SQLAlchemy dialect from https://github.com/deontologician/netezza_sqlalchemy

使用 -
Pandas: 0.14.1 sqlalchemy
: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python 发行版 (Python 2.7.6)
Josh Kuhn 的 Netezza SQLAlchemy 方言来自https://github.com/deontologician/netezza_sqlalchemy

Answer 1

采纳答案by ssharma

You can now pass SQLAlchemy connectable to pandas.read_sql. From the docs:

您现在可以将 SQLAlchemy 连接到pandas.read_sql. 从文档：

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)
...
con : SQLAlchemy connectable (engine/connection) or database string URI
or DBAPI2 connection (fallback mode)
Using SQLAlchemy makes it possible to use any DB supported by that > library. If a DBAPI2 object, only sqlite3 is supported.

pandas.read_sql（sql，con，index_col=None，coerce_float=True，params=None，parse_dates=None，columns=None，chunksize=None）
...
con : SQLAlchemy 可连接（引擎/连接）或数据库字符串 URI
或 DBAPI2 连接（回退模式）
使用 SQLAlchemy 可以使用该 > 库支持的任何数据库。如果是 DBAPI2 对象，则仅支持 sqlite3。

So, this should work:

所以，这应该有效：

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=connection)

Answer 2

回答by Luis A.G.

As @ssharma says, you can now pass SQLAlchemy connectable to pandas.read_sql. If you create the session with the session maker, you need the connection object.

正如@ssharma 所说，您现在可以将可连接的 SQLAlchemy 传递给 pandas.read_sql。如果您使用会话创建者创建会话，则需要连接对象。

For reading the uncommited changes you have to use the same connection like this:

要读取未提交的更改，您必须使用相同的连接，如下所示：

engine = sqlalchemy.create_engine('netezza://@mydsn')
session = sessionmaker(bind=self.engine)()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
session.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=session.connection())

Answer 3

回答by Carlos Chaccon

All you need to do is add 'SET NOCOUNT ON' at the beginning of your query, that way pandas read_sql will read everything as one statement.

您需要做的就是在查询的开头添加“SET NOCOUNT ON”，这样pandas read_sql 会将所有内容作为一个语句读取。

sSQL = '''SET NOCOUNT ON
CREATE TABLE ...... '''

Answer 4

回答by Andrew

You are using Python and Netezza, I was using R and SQL Server, so this might be different. In my script, I ran into a similar issue. sp_execute_external_scriptin T-SQL which allows for external code to run on the database only allows for selectstatements. This was burdensome for me because I wanted to run a stored procedure to create a temp table to select from. Alternatively, I could use common table expressions, unions, etc. It might be worth further investigation.

您使用的是 Python 和 Netezza，我使用的是 R 和 SQL Server，所以这可能会有所不同。在我的脚本中，我遇到了类似的问题。 sp_execute_external_script在允许外部代码在数据库上运行的 T-SQL 中，只允许select语句。这对我来说很麻烦，因为我想运行一个存储过程来创建一个临时表以供选择。或者，我可以使用公共表表达式、联合等。这可能值得进一步研究。

Answer 5

回答by melihozbek

I understand the issue, but is creating regular tables not working? You could come up with a convention such as CREATE TABLE TEMP_t1' etc., andDROP` them at the end of your session.

我理解这个问题，但是创建常规表不起作用吗？您可以CREATE TABLE TEMP_t1' etc., and在会话结束时提出诸如DROP 之类的约定。

pandas.read_sql_query() 如何查询 TEMP 表？

提问by DavidJ

采纳答案by ssharma

回答by Luis A.G.

回答by Carlos Chaccon

回答by Andrew

回答by melihozbek

相关推荐

最近更新

标签

pandas.read_sql_query() 如何查询 TEMP 表？

提问by DavidJ

采纳答案by ssharma

回答by Luis A.G.

回答by Carlos Chaccon

回答by Andrew

回答by melihozbek

相关推荐

pandas 如何打开此 XML 文件以在 Python 中创建数据框？

pandas 无法使用系列内置函数对时间戳应用方法

沿每列计算 Pandas DataFrame 的自相关

将 Pandas 数据框的多列转换为虚拟变量 - Python

相关推荐

最近更新

标签