使用 psycopg2 将 Pandas DataFrame 快速插入 Postgres DB

Question

提问by Arthur G

I am trying to insert a pandasDataFrame into a Postgresql DB (9.1) in the most efficient way (using Python 2.7).
Using "cursor.execute_many" is really slow, so is "DataFrame.to_csv(buffer,...)" together with "copy_from".
I found an already much! faster solution on the web (http://eatthedots.blogspot.de/2008/08/faking-read-support-for-psycopgs.html) which I adapted to work with pandas.
My code can be found below.
My question is whether the method of this related question (using "copy from stdin with binary") can be easily transferred to work with DataFrames and if this would be much faster.
Use binary COPY table FROM with psycopg2
Unfortunately my Python skills aren't sufficient to understand the implementation of this approach.
This is my approach:

我试图插入一个大熊猫的最有效的方式（使用Python 2.7）数据帧为一个PostgreSQL数据库（9.1）。
使用“cursor.execute_many”真的很慢，“DataFrame.to_csv(buffer,...)”和“copy_from”也是如此。
我发现了一个已经很多了！网络上更快的解决方案（http://eatthedots.blogspot.de/2008/08/faking-read-support-for-psycopgs.html）我适应了与熊猫一起工作。
我的代码可以在下面找到。
我的问题是这个相关问题的方法（使用“从标准输入复制二进制文件”）是否可以很容易地转移到数据帧上，如果这会快得多。
使用二进制 COPY 表 FROM 和 psycopg2
不幸的是，我的 Python 技能不足以理解这种方法的实现。
这是我的方法：


import psycopg2
import connectDB # this is simply a module that returns a connection to the db
from datetime import datetime

class ReadFaker:
    """
    This could be extended to include the index column optionally. Right now the index
    is not inserted
    """
    def __init__(self, data):
        self.iter = data.itertuples()

    def readline(self, size=None):
        try:
            line = self.iter.next()[1:]  # element 0 is the index
            row = '\t'.join(x.encode('utf8') if isinstance(x, unicode) else str(x) for x in line) + '\n'
        # in my case all strings in line are unicode objects.
        except StopIteration:
            return ''
        else:
            return row

    read = readline

def insert(df, table, con=None, columns = None):

    time1 = datetime.now()
    close_con = False
    if not con:
        try:
            con = connectDB.getCon()   ###dbLoader returns a connection with my settings
            close_con = True
        except psycopg2.Error, e:
            print e.pgerror
            print e.pgcode
            return "failed"
    inserted_rows = df.shape[0]
    data = ReadFaker(df)

    try:
        curs = con.cursor()
        print 'inserting %s entries into %s ...' % (inserted_rows, table)
        if columns is not None:
            curs.copy_from(data, table, null='nan', columns=[col for col in columns])
        else:
            curs.copy_from(data, table, null='nan')
        con.commit()
        curs.close()
        if close_con:
            con.close()
    except psycopg2.Error, e:
        print e.pgerror
        print e.pgcode
        con.rollback()
        if close_con:
            con.close()
        return "failed"

    time2 = datetime.now()
    print time2 - time1
    return inserted_rows

Answer 1

回答by foobarbecue

Pandas dataframes now have a .to_sql method. Postgresql is not supported yet, but there's a patch for it that looks like it works. See the issues hereand here.

Pandas 数据框现在有一个 .to_sql 方法。尚不支持 Postgresql，但有一个补丁看起来可以正常工作。请参阅此处和此处的问题。

Answer 2

回答by lbolla

I have not tested the performance, but maybe you can use something like this:

我还没有测试过性能，但也许你可以使用这样的东西：

Iterate thru the rows of the DataFrame, yielding a string representing a row (see below)
Convert this iterable in a stream, using for example Python: Convert an iterable to a stream?
Finally use psycopg's copy_fromon this stream.

遍历 DataFrame 的行，产生一个表示行的字符串（见下文）
在流中转换此可迭代对象，例如使用Python：将可迭代对象转换为流？
最后copy_from在这个流上使用 psycopg 。

To yield rows of a DataFrame efficiently use something like:

要有效地生成 DataFrame 的行，请使用以下内容：

    def r(df):
            for idx, row in df.iterrows():
                    yield ','.join(map(str, row))

使用 psycopg2 将 Pandas DataFrame 快速插入 Postgres DB

提问by Arthur G

回答by foobarbecue

回答by lbolla

相关推荐

最近更新

标签

使用 psycopg2 将 Pandas DataFrame 快速插入 Postgres DB

提问by Arthur G

回答by foobarbecue

回答by lbolla

相关推荐

C# WPF 鼠标点击事件

在 wpf 中以编程方式更改字体系列

WPF 选择 DataGrid 中的所有 CheckBox

wpf 操作返回了无效的状态代码“未授权”

相关推荐

最近更新

标签