Python 如何将 DataFrame 写入 postgres 表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23103962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to write DataFrame to postgres table?
提问by m9_psy
There is DataFrame.to_sqlmethod, but it works only for mysql, sqlite and oracle databases. I cant pass to this method postgres connection or sqlalchemy engine.
有DataFrame.to_sql方法,但它只适用于 mysql、sqlite 和 oracle 数据库。我无法通过 postgres 连接或 sqlalchemy 引擎传递给这种方法。
采纳答案by joris
Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sql
module now uses sqlalchemy
to support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:
从 pandas 0.14(2014 年 5 月下旬发布)开始,支持 postgresql。该sql
模块现在用于sqlalchemy
支持不同的数据库风格。您可以为 postgresql 数据库传递 sqlalchemy 引擎(请参阅文档)。例如:
from sqlalchemy import create_engine
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
df.to_sql('table_name', engine)
You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql
: https://gist.github.com/jorisvandenbossche/10841234.
I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where con
is a postgresql connection):
您是正确的,在版本 0.13.1 之前的 Pandas 中不支持 postgresql。如果您需要使用旧版本的熊猫,这里有一个补丁版本pandas.io.sql
:https: //gist.github.com/jorisvandenbossche/10841234。
我之前写过这个,所以不能完全保证它总是有效,但基础应该在那里)。如果将该文件放在工作目录中并导入,那么您应该可以执行以下操作(con
postgresql 连接在哪里):
import sql # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')
回答by Aseem
Faster option:
更快的选择:
The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.
以下代码将您的 Pandas DF 复制到 postgres DB 的速度比 df.to_sql 方法快得多,并且您不需要任何中间 csv 文件来存储 df。
Create an engine based on your DB specifications.
根据您的数据库规范创建引擎。
Create a table in your postgres DB that has equal number of columns as the Dataframe (df).
在您的 postgres 数据库中创建一个表,该表的列数与数据框 (df) 的列数相同。
Data in DF will get insertedin your postgres table.
DF 中的数据将插入您的 postgres 表中。
from sqlalchemy import create_engine
import psycopg2
import io
if you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.
如果要替换表,我们可以使用来自 df 的标头将其替换为普通的 to_sql 方法,然后将整个耗时的 df 加载到数据库中。
engine = create_engine('postgresql+psycopg2://username:password@host:port/database')
df.head(0).to_sql('table_name', engine, if_exists='replace',index=False) #truncates the table
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()
回答by Behdad Forghani
This is how I did it.
我就是这样做的。
It may be faster because it is using execute_batch
:
它可能更快,因为它正在使用execute_batch
:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
回答by mgoldwasser
Pandas 0.24.0+ solution
Pandas 0.24.0+ 解决方案
In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
在 Pandas 0.24.0 中引入了一个新特性,专门用于快速写入 Postgres。您可以在此处了解更多信息:https: //pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
import csv
from io import StringIO
from sqlalchemy import create_engine
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name)
else:
table_name = table.name
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)
回答by Mayukh Ghosh
For Python 2.7 and Pandas 0.24.2 and using Psycopg2
对于 Python 2.7 和 Pandas 0.24.2 并使用 Psycopg2
Psycopg2 Connection Module
Psycopg2 连接模块
def dbConnect (db_parm, username_parm, host_parm, pw_parm):
# Parse in connection information
credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm}
conn = psycopg2.connect(**credentials)
conn.autocommit = True # auto-commit each entry to the database
conn.cursor_factory = RealDictCursor
cur = conn.cursor()
print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm))
return conn, cur
Connect to the database
连接到数据库
conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd)
Assuming dataframe to be present already as df
假设数据帧已经作为 df 存在
output = io.BytesIO() # For Python3 use StringIO
df.to_csv(output, sep='\t', header=True, index=False)
output.seek(0) # Required for rewinding the String object
copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL '' ESCAPE '\' HEADER " # Replace your table name in place of mem_info
cur.copy_expert(copy_query, output)
conn.commit()