从 pandas.DataFrame.to_sql 将 SQL 输出为字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32920127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Output SQL as string from pandas.DataFrame.to_sql
提问by kliron
Is there a way of making pandas (or sqlalchemy) output the SQL that would be executed by a call to to_sql()instead of actually executing it? This would be handy in many cases where I actually need to update multiple databases with the same data where python and pandas only exists in one of my machines.
有没有办法让pandas(或sqlalchemy)输出将通过调用to_sql()而不是实际执行它来执行的SQL ?在许多情况下,这会很方便,我实际上需要使用相同的数据更新多个数据库,而 python 和 pandas 仅存在于我的一台机器中。
采纳答案by Parfait
This is more a process question than a programming one. First, is the use of multiple databases. Relational databases management systems (RDMBS) are designed as multiple-user systems for many simultaneous users/apps/clients/machines. Designed to run as ONE system, the database serves as the central repository for related applications. Some argue databases should be agnostic to apps and be data-centric (Postgre folks) and others believe databases should be app-centric (MySQL folks). Overall, understand they are more involved than a flatfile spreadsheet or data frame.
这与其说是编程问题,不如说是一个过程问题。首先,是使用多个数据库。关系数据库管理系统 (RDMBS) 被设计为多用户系统,供许多同时使用的用户/应用程序/客户端/机器使用。该数据库旨在作为一个系统运行,充当相关应用程序的中央存储库。有些人认为数据库应该与应用程序无关并且以数据为中心(Postgre 人员),而另一些人则认为数据库应该以应用程序为中心(MySQL 人员)。总体而言,了解它们比平面文件电子表格或数据框更复杂。
Usually, RDMS's come in two structural types:
通常,RDMS 有两种结构类型:
- file level systemslike SQLite and MS Access (where databases reside in a file saved to CPU directory); these systems though still powerful and multi-user mostly serve for smaller business applications with relatively handful of users or team sizes
- server-level systemslike SQL Server, MySQL, PostgreSQL, DB2, Oracle (where databases run over a network without any localized file); these systems serve as enterprise level systems to run full-scale business operations run over LAN intranets or web networks.
- 文件级系统,如 SQLite 和 MS Access(其中数据库驻留在保存到 CPU 目录的文件中);这些系统虽然仍然强大且多用户,但主要服务于用户或团队规模相对较少的小型商业应用程序
- 服务器级系统,如 SQL Server、MySQL、PostgreSQL、DB2、Oracle(数据库在没有任何本地化文件的网络上运行);这些系统用作企业级系统,以运行在 LAN 内部网或 Web 网络上运行的全面业务操作。
Meanwhile, Pandas is not a database but a data analysis toolkit (much like MS Excel) though it can import/export queried resultsets from RDMS's. Therefore, it maintains no native SQL dialect for DDL/DMLprocedures. Moreover, pandas runs in memory on the OS calling the Python script and cannot be shared by other clients/machines. Pandas does not track changes like you intend in order to know the different states of a data frame during runtime of script unless you design it that way with a before and after and identify column/row changes.
同时,Pandas 不是数据库而是数据分析工具包(很像 MS Excel),尽管它可以从 RDMS 导入/导出查询的结果集。因此,它不为DDL/DML过程维护本地 SQL 方言。此外,pandas 在调用 Python 脚本的操作系统的内存中运行,并且不能被其他客户端/机器共享。Pandas 不会按照您的意图跟踪更改,以便在脚本运行时了解数据框的不同状态,除非您以这种方式设计前后并识别列/行更改。
With that mouthful said, why not use ONE database and have your Python script serve as just another of the many clients that connect to the database to import/export data into data frame. Hence, after every data frame change actually run the to_sql(). Recall pandas' to_sqluses the if_existsargument:
说了这么多,为什么不使用 ONE 数据库并将您的 Python 脚本作为连接到数据库以将数据导入/导出到数据框的众多客户端中的另一个。因此,在每次数据帧更改后实际运行to_sql(). 回想一下Pandas的to_sql使用的if_exists参数:
# DROPS TABLE, RECREATES IT, AND UPDATES IT
df.to_sql(name='tablename', con=conn, if_exists='replace')
# APPENDS DF DATA TO EXISTING TABLE
df.to_sql(name='tablename', con=conn, if_exists='append')
In turn, every app/machine that connects to the centralized database will only need to refresh their instance and current data would be available in real-time for their end use needs. Though of course, table-locking states can be an issue in multi-user environments if another user had a table record in edit mode while your script tried updating it. But transactions here may help.
反过来,连接到集中式数据库的每个应用程序/机器只需要刷新他们的实例,当前数据将实时可用以满足他们的最终使用需求。当然,如果另一个用户在您的脚本尝试更新它时在编辑模式下有一个表记录,则表锁定状态可能会成为多用户环境中的一个问题。但这里的交易可能会有所帮助。

