SQLAlchemy、Psycopg2 和 Postgresql 复制

Question

提问by EoghanM

It looks like Psycopg has a custom command for executing a COPY:

看起来 Psycopg 有一个用于执行COPY的自定义命令：

psycopg2 COPY using cursor.copy_from() freezes with large inputs

使用 cursor.copy_from() 的 psycopg2 COPY 因大输入而冻结

Is there a way to access this functionality from with SQLAlchemy?

有没有办法使用 SQLAlchemy 访问此功能？

Answer 1

采纳答案by swasheck

It doesn't look like it.

它看起来不像。

You may have to just use psycopg2 to expose this functionality and forego the ORM capabilities. I guess I don't really see the benefit of ORM in such an operation anyway since it's a straight bulk insert and dealing with individual objects a la an ORM would not really make a whole lot of sense.

您可能只需要使用 psycopg2 来公开此功能并放弃 ORM 功能。我想我并没有真正看到 ORM 在这样的操作中的好处，因为它是直接批量插入并且处理单个对象，因为 ORM 并没有真正的意义。

Answer 2

回答by dnfehren

accepted answer is correct but if you want more than just the EoghanM's comment to go on the following worked for me in COPYing a table out to CSV...

接受的答案是正确的，但如果您想要的不仅仅是 EoghanM 的评论，那么以下对我来说在将表格复制到 CSV 时对我有用...

from sqlalchemy import sessionmaker, create_engine

eng = create_engine("postgresql://user:pwd@host:5432/db")
ses = sessionmaker(bind=engine)

dbcopy_f = open('/tmp/some_table_copy.csv','wb')

copy_sql = 'COPY some_table TO STDOUT WITH CSV HEADER'

fake_conn = eng.raw_connection()
fake_cur = fake_conn.cursor()
fake_cur.copy_expert(copy_sql, dbcopy_f)

The sessionmakerisn't necessary but if you're in the habit of creating the engine and the session at the same time to use raw_connectionyou'll need separate them (unless there is some way to access the engine through the session object that I don't know). The sql string provided to copy_expertis also not the only way to it, there is a basic copy_tofunction that you can use with subset of the parameters that you could past to a normal COPYTO query. Overall performance of the command seems fast for me, copying out a table of ~20000 rows.

这sessionmaker不是必需的，但如果您习惯于同时创建引擎和会话来使用，raw_connection您将需要将它们分开（除非有某种方法可以通过我不知道的会话对象访问引擎）不知道）。提供给的 sql 字符串copy_expert也不是唯一的方法，有一个基本copy_to函数，您可以将其与可以传递给普通COPYTO 查询的参数子集一起使用。命令的整体性能对我来说似乎很快，复制了一个大约 20000 行的表。

http://initd.org/psycopg/docs/cursor.html#cursor.copy_to http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.Engine.raw_connection

Answer 3

回答by dtheodor

If your engine is configured with a psycopg2 connection string (which is the default, so either "postgresql://..."or "postgresql+psycopg2://..."), you can create a psycopg2 cursor from an SQL Alchemy session using

如果您的引擎配置了 psycopg2 连接字符串（这是默认值，因此可以使用"postgresql://..."或"postgresql+psycopg2://..."），您可以使用以下命令从 SQL Alchemy 会话创建 psycopg2 游标

cursor = session.connection().connection.cursor()

which you can use to execute

您可以使用它来执行

cursor.copy_from(...)

The cursor will be active in the same transaction as your session currently is. If a commitor rollbackhappens, any further use of the cursor with throw a psycopg2.InterfaceError, you would have to create a new one.

光标将在与您当前的会话相同的事务中处于活动状态。如果 acommit或rollback发生，任何进一步使用带有 throw a 的游标psycopg2.InterfaceError，您将不得不创建一个新游标。

Answer 4

回答by Fabien Vauchelles

You can use:

您可以使用：

def to_sql(engine, df, table, if_exists='fail', sep='\t', encoding='utf8'):
    # Create Table
    df[:0].to_sql(table, engine, if_exists=if_exists)

    # Prepare data
    output = cStringIO.StringIO()
    df.to_csv(output, sep=sep, header=False, encoding=encoding)
    output.seek(0)

    # Insert data
    connection = engine.raw_connection()
    cursor = connection.cursor()
    cursor.copy_from(output, table, sep=sep, null='')
    connection.commit()
    cursor.close()

I insert 200000 lines in 5 seconds instead of 4 minutes

我在 5 秒而不是 4 分钟内插入 200000 行

Answer 5

回答by Michael Ekoka

If you can get to the engine you have all you need to do this:

如果您可以到达引擎，您就拥有了执行此操作所需的一切：

engine = create_engine('postgresql+psycopg2://myuser:password@localhost/mydb')
# or 
engine = session.engine
# or any other way you know to get to the engine

Now you can work.

现在你可以工作了。

# isolate a connection
connection = engine.connect().connection

# get the cursor
cursor = connection.cursor()

Here are some templates for the COPY statement to use with cursor.copy_expert(), a more complete and flexible option than copy_from()or copy_to()as it is indicated here: http://initd.org/psycopg/docs/cursor.html#cursor.copy_expert.

下面是针对该份声明与使用一些模板cursor.copy_expert()，一个更完整，更灵活的选择比copy_from()或copy_to()因为它是在此间表示：http://initd.org/psycopg/docs/cursor.html#cursor.copy_expert。

# to dump to a file
dump_to = """
COPY mytable 
TO STDOUT
WITH (
    FORMAT CSV,
    DELIMITER ',',
    HEADER
);
"""

# to copy from a file:
copy_from = """
COPY mytable 
FROM STDIN
WITH (
    FORMAT CSV,
    DELIMITER ',',
    HEADER
);
"""

Check out what the options above mean and others that may be of interest to your specific situation https://www.postgresql.org/docs/current/static/sql-copy.html.

查看上述选项的含义以及您的特定情况可能感兴趣的其他选项https://www.postgresql.org/docs/current/static/sql-copy.html。

IMPORTANT NOTE: The link to the documentation of cursor.copy_expert()indicates to use STDOUT to write out to a file and STDIN to copy from a file. But if you look at the syntax on the PostgreSQL manual, you'll notice that you can also specify the file to write to or from directly in the COPY statement. Don't do that, you're likely just wasting your time if you're not running as root (who runs Python as root during development?) Just do what's indicated in the psycopg2's docs and specify STDIN or STDOUT in your statement with cursor.copy_expert(), it should be fine.

重要说明：指向文档的链接cursor.copy_expert()指示使用 STDOUT 写出到文件和使用 STDIN 从文件复制。但是，如果您查看 PostgreSQL 手册上的语法，您会注意到您还可以直接在 COPY 语句中指定要写入的文件或直接写入的文件。不要这样做，如果您不是以 root 身份运行，那么您可能只是在浪费时间（在开发过程中谁以 root 身份运行 Python？）只需按照 psycopg2 的文档中的说明进行操作cursor.copy_expert()，并在您的语句中指定 STDIN 或 STDOUT ，应该没问题。

# running the copy statement
with open('/path/to/your/data/file.csv') as f:
     cursor.copy_expert(copy_from, file=f)

# don't forget to commit the changes.
connection.commit()

Answer 6

回答by berdario

You don't need to drop down to psycopg2, use raw_connection nor a cursor.

您不需要下拉到 psycopg2，使用 raw_connection 或游标。

Just execute the sql as usual, you can even use bind parameters with text():

只需像往常一样执行 sql，您甚至可以使用绑定参数text()：

engine.execute(text('''copy some_table from :csv
                       delimiter ',' csv'''
                   ).execution_options(autocommit=True),
               csv='/tmp/a.csv')

You can drop the execution_options(autocommit=True)if this PRwill be accepted

execution_options(autocommit=True)如果此 PR将被接受，您可以删除

SQLAlchemy、Psycopg2 和 Postgresql 复制

提问by EoghanM

采纳答案by swasheck

回答by dnfehren

回答by dtheodor

回答by Fabien Vauchelles

回答by Michael Ekoka

回答by berdario

相关推荐

最近更新

标签

SQLAlchemy、Psycopg2 和 Postgresql 复制

提问by EoghanM

采纳答案by swasheck

回答by dnfehren

回答by dtheodor

回答by Fabien Vauchelles

回答by Michael Ekoka

回答by berdario

相关推荐

如何在 PostgreSQL 中获取数组的最后一个元素？

postgresql Django - order_by 如何工作？

postgresql PG::Error 服务器意外关闭连接

postgresql 使用带有 IN 的 ILIKE 查询

相关推荐

最近更新

标签