Python 如何将 DataFrame 写入 postgres 表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23103962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:14:21  来源:igfitidea点击:

How to write DataFrame to postgres table?

pythonpostgresqlpandassqlalchemy

提问by m9_psy

There is DataFrame.to_sqlmethod, but it works only for mysql, sqlite and oracle databases. I cant pass to this method postgres connection or sqlalchemy engine.

DataFrame.to_sql方法,但它只适用于 mysql、sqlite 和 oracle 数据库。我无法通过 postgres 连接或 sqlalchemy 引擎传递给这种方法。

采纳答案by joris

Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sqlmodule now uses sqlalchemyto support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:

从 pandas 0.14(2014 年 5 月下旬发布)开始,支持 postgresql。该sql模块现在用于sqlalchemy支持不同的数据库风格。您可以为 postgresql 数据库传递 sqlalchemy 引擎(请参阅文档)。例如:

from sqlalchemy import create_engine
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
df.to_sql('table_name', engine)


You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql: https://gist.github.com/jorisvandenbossche/10841234.
I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where conis a postgresql connection):

您是正确的,在版本 0.13.1 之前的 Pandas 中不支持 postgresql。如果您需要使用旧版本的熊猫,这里有一个补丁版本pandas.io.sqlhttps: //gist.github.com/jorisvandenbossche/10841234
我之前写过这个,所以不能完全保证它总是有效,但基础应该在那里)。如果将该文件放在工作目录中并导入,那么您应该可以执行以下操作(conpostgresql 连接在哪里):

import sql  # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')

回答by Aseem

Faster option:

更快的选择:

The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.

以下代码将您的 Pandas DF 复制到 postgres DB 的速度比 df.to_sql 方法快得多,并且您不需要任何中间 csv 文件来存储 df。

Create an engine based on your DB specifications.

根据您的数据库规范创建引擎。

Create a table in your postgres DB that has equal number of columns as the Dataframe (df).

在您的 postgres 数据库中创建一个表,该表的列数与数据框 (df) 的列数相同。

Data in DF will get insertedin your postgres table.

DF 中的数据将插入您的 postgres 表中。

from sqlalchemy import create_engine
import psycopg2 
import io

if you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.

如果要替换表,我们可以使用来自 df 的标头将其替换为普通的 to_sql 方法,然后将整个耗时的 df 加载到数据库中。

engine = create_engine('postgresql+psycopg2://username:password@host:port/database')

df.head(0).to_sql('table_name', engine, if_exists='replace',index=False) #truncates the table

conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()

回答by Behdad Forghani

This is how I did it.

我就是这样做的。

It may be faster because it is using execute_batch:

它可能更快,因为它正在使用execute_batch

# df is the dataframe
if len(df) > 0:
    df_columns = list(df)
    # create (col1,col2,...)
    columns = ",".join(df_columns)

    # create VALUES('%s', '%s",...) one '%s' per column
    values = "VALUES({})".format(",".join(["%s" for _ in df_columns])) 

    #create INSERT INTO table (columns) VALUES('%s',...)
    insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)

    cur = conn.cursor()
    psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
    conn.commit()
    cur.close()

回答by mgoldwasser

Pandas 0.24.0+ solution

Pandas 0.24.0+ 解决方案

In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

在 Pandas 0.24.0 中引入了一个新特性,专门用于快速写入 Postgres。您可以在此处了解更多信息:https: //pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

import csv
from io import StringIO

from sqlalchemy import create_engine

def psql_insert_copy(table, conn, keys, data_iter):
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)

回答by Mayukh Ghosh

For Python 2.7 and Pandas 0.24.2 and using Psycopg2

对于 Python 2.7 和 Pandas 0.24.2 并使用 Psycopg2

Psycopg2 Connection Module

Psycopg2 连接模块

def dbConnect (db_parm, username_parm, host_parm, pw_parm):
    # Parse in connection information
    credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm}
    conn = psycopg2.connect(**credentials)
    conn.autocommit = True  # auto-commit each entry to the database
    conn.cursor_factory = RealDictCursor
    cur = conn.cursor()
    print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm))
    return conn, cur

Connect to the database

连接到数据库

conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd)

Assuming dataframe to be present already as df

假设数据帧已经作为 df 存在

output = io.BytesIO() # For Python3 use StringIO
df.to_csv(output, sep='\t', header=True, index=False)
output.seek(0) # Required for rewinding the String object
copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL ''  ESCAPE '\' HEADER "  # Replace your table name in place of mem_info
cur.copy_expert(copy_query, output)
conn.commit()