SQLAlchemy、Psycopg2 和 Postgresql 复制
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13125236/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQLAlchemy, Psycopg2 and Postgresql COPY
提问by EoghanM
It looks like Psycopg has a custom command for executing a COPY:
看起来 Psycopg 有一个用于执行COPY的自定义命令:
psycopg2 COPY using cursor.copy_from() freezes with large inputs
使用 cursor.copy_from() 的 psycopg2 COPY 因大输入而冻结
Is there a way to access this functionality from with SQLAlchemy?
有没有办法使用 SQLAlchemy 访问此功能?
采纳答案by swasheck
You may have to just use psycopg2 to expose this functionality and forego the ORM capabilities. I guess I don't really see the benefit of ORM in such an operation anyway since it's a straight bulk insert and dealing with individual objects a la an ORM would not really make a whole lot of sense.
您可能只需要使用 psycopg2 来公开此功能并放弃 ORM 功能。我想我并没有真正看到 ORM 在这样的操作中的好处,因为它是直接批量插入并且处理单个对象,因为 ORM 并没有真正的意义。
回答by dnfehren
accepted answer is correct but if you want more than just the EoghanM's comment to go on the following worked for me in COPYing a table out to CSV...
接受的答案是正确的,但如果您想要的不仅仅是 EoghanM 的评论,那么以下对我来说在将表格复制到 CSV 时对我有用...
from sqlalchemy import sessionmaker, create_engine
eng = create_engine("postgresql://user:pwd@host:5432/db")
ses = sessionmaker(bind=engine)
dbcopy_f = open('/tmp/some_table_copy.csv','wb')
copy_sql = 'COPY some_table TO STDOUT WITH CSV HEADER'
fake_conn = eng.raw_connection()
fake_cur = fake_conn.cursor()
fake_cur.copy_expert(copy_sql, dbcopy_f)
The sessionmaker
isn't necessary but if you're in the habit of creating the engine and the session at the same time to use raw_connection
you'll need separate them (unless there is some way to access the engine through the session object that I don't know). The sql string provided to copy_expert
is also not the only way to it, there is a basic copy_to
function that you can use with subset of the parameters that you could past to a normal COPY
TO query. Overall performance of the command seems fast for me, copying out a table of ~20000 rows.
这sessionmaker
不是必需的,但如果您习惯于同时创建引擎和会话来使用,raw_connection
您将需要将它们分开(除非有某种方法可以通过我不知道的会话对象访问引擎)不知道)。提供给的 sql 字符串copy_expert
也不是唯一的方法,有一个基本copy_to
函数,您可以将其与可以传递给普通COPY
TO 查询的参数子集一起使用。命令的整体性能对我来说似乎很快,复制了一个大约 20000 行的表。
http://initd.org/psycopg/docs/cursor.html#cursor.copy_tohttp://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.Engine.raw_connection
http://initd.org/psycopg/docs/cursor.html#cursor.copy_to http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.Engine.raw_connection
回答by dtheodor
If your engine is configured with a psycopg2 connection string (which is the default, so either "postgresql://..."
or "postgresql+psycopg2://..."
), you can create a psycopg2 cursor from an SQL Alchemy session using
如果您的引擎配置了 psycopg2 连接字符串(这是默认值,因此可以使用"postgresql://..."
或"postgresql+psycopg2://..."
),您可以使用以下命令从 SQL Alchemy 会话创建 psycopg2 游标
cursor = session.connection().connection.cursor()
which you can use to execute
您可以使用它来执行
cursor.copy_from(...)
The cursor will be active in the same transaction as your session currently is. If a commit
or rollback
happens, any further use of the cursor with throw a psycopg2.InterfaceError
, you would have to create a new one.
光标将在与您当前的会话相同的事务中处于活动状态。如果 acommit
或rollback
发生,任何进一步使用带有 throw a 的游标psycopg2.InterfaceError
,您将不得不创建一个新游标。
回答by Fabien Vauchelles
You can use:
您可以使用:
def to_sql(engine, df, table, if_exists='fail', sep='\t', encoding='utf8'):
# Create Table
df[:0].to_sql(table, engine, if_exists=if_exists)
# Prepare data
output = cStringIO.StringIO()
df.to_csv(output, sep=sep, header=False, encoding=encoding)
output.seek(0)
# Insert data
connection = engine.raw_connection()
cursor = connection.cursor()
cursor.copy_from(output, table, sep=sep, null='')
connection.commit()
cursor.close()
I insert 200000 lines in 5 seconds instead of 4 minutes
我在 5 秒而不是 4 分钟内插入 200000 行
回答by Michael Ekoka
If you can get to the engine you have all you need to do this:
如果您可以到达引擎,您就拥有了执行此操作所需的一切:
engine = create_engine('postgresql+psycopg2://myuser:password@localhost/mydb')
# or
engine = session.engine
# or any other way you know to get to the engine
Now you can work.
现在你可以工作了。
# isolate a connection
connection = engine.connect().connection
# get the cursor
cursor = connection.cursor()
Here are some templates for the COPY statement to use with cursor.copy_expert()
, a more complete and flexible option than copy_from()
or copy_to()
as it is indicated here: http://initd.org/psycopg/docs/cursor.html#cursor.copy_expert.
下面是针对该份声明与使用一些模板cursor.copy_expert()
,一个更完整,更灵活的选择比copy_from()
或copy_to()
因为它是在此间表示:http://initd.org/psycopg/docs/cursor.html#cursor.copy_expert。
# to dump to a file
dump_to = """
COPY mytable
TO STDOUT
WITH (
FORMAT CSV,
DELIMITER ',',
HEADER
);
"""
# to copy from a file:
copy_from = """
COPY mytable
FROM STDIN
WITH (
FORMAT CSV,
DELIMITER ',',
HEADER
);
"""
Check out what the options above mean and others that may be of interest to your specific situation https://www.postgresql.org/docs/current/static/sql-copy.html.
查看上述选项的含义以及您的特定情况可能感兴趣的其他选项https://www.postgresql.org/docs/current/static/sql-copy.html。
IMPORTANT NOTE: The link to the documentation of cursor.copy_expert()
indicates to use STDOUT to write out to a file and STDIN to copy from a file. But if you look at the syntax on the PostgreSQL manual, you'll notice that you can also specify the file to write to or from directly in the COPY statement. Don't do that, you're likely just wasting your time if you're not running as root (who runs Python as root during development?) Just do what's indicated in the psycopg2's docs and specify STDIN or STDOUT in your statement with cursor.copy_expert()
, it should be fine.
重要说明:指向文档的链接cursor.copy_expert()
指示使用 STDOUT 写出到文件和使用 STDIN 从文件复制。但是,如果您查看 PostgreSQL 手册上的语法,您会注意到您还可以直接在 COPY 语句中指定要写入的文件或直接写入的文件。不要这样做,如果您不是以 root 身份运行,那么您可能只是在浪费时间(在开发过程中谁以 root 身份运行 Python?)只需按照 psycopg2 的文档中的说明进行操作cursor.copy_expert()
,并在您的语句中指定 STDIN 或 STDOUT ,应该没问题。
# running the copy statement
with open('/path/to/your/data/file.csv') as f:
cursor.copy_expert(copy_from, file=f)
# don't forget to commit the changes.
connection.commit()
回答by berdario
You don't need to drop down to psycopg2, use raw_connection nor a cursor.
您不需要下拉到 psycopg2,使用 raw_connection 或游标。
Just execute the sql as usual, you can even use bind parameters with text()
:
只需像往常一样执行 sql,您甚至可以使用绑定参数text()
:
engine.execute(text('''copy some_table from :csv
delimiter ',' csv'''
).execution_options(autocommit=True),
csv='/tmp/a.csv')
You can drop the execution_options(autocommit=True)
if this PRwill be accepted
execution_options(autocommit=True)
如果此 PR将被接受,您可以删除