如何使用 Python 批量插入 Oracle 数据库?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14904033/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:48:18  来源:igfitidea点击:

How can I do a batch insert into an Oracle database using Python?

pythonoraclepython-2.7cx-oraclebatch-insert

提问by James Adams

I have some monthly weather data that I want to insert into an Oracle database table but I want to insert the corresponding records in a batch in order to be more efficient. Can anyone advise as to how I'd go about doing this in Python?

我有一些我想插入到 Oracle 数据库表中的每月天气数据,但我想批量插入相应的记录以提高效率。谁能建议我如何在 Python 中执行此操作?

For example let's say my table has four fields: a station ID, a date, and two value fields. The records are uniquely identified by the station ID and date fields (composite key). The values I'll have to insert for each station will be kept in a list with X number of full years worth of data, so for example if there are two years of values then the value lists will contain 24 values.

例如,假设我的表有四个字段:一个站 ID、一个日期和两个值字段。记录由站 ID 和日期字段(复合键)唯一标识。我必须为每个站插入的值将保存在一个包含 X 个完整年份数据的列表中,例如,如果有两年的值,那么值列表将包含 24 个值。

I assume that below is the way I'd do this if I wanted to insert the records one at a time:

如果我想一次插入一个记录,我认为下面是我这样做的方式:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12
for i in range(number_of_years):
    for j in range(12):
        # make a date for the first day of the month
        date_value = datetime.date(start_year + i, j + 1, 1)
        index = (i * 12) + j
        sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
        cursor.execute(sql_insert)
connection.commit()

Is there a way to do what I'm doing above but in a way that performs a batch insert in order to increase efficiency? BTW my experience is with Java/JDBC/Hibernate so if someone can give an explanation/example which compares to the Java approach then it'd be especially helpful.

有没有办法做我上面正在做的事情,但是以执行批量插入以提高效率的方式?顺便说一句,我的经验是使用 Java/JDBC/Hibernate,所以如果有人可以给出与 Java 方法相比的解释/示例,那么它会特别有帮助。

EDIT: Perhaps I need to use cursor.executemany() as described here?

编辑:也许我需要使用这里描述的 cursor.executemany() ?

Thanks in advance for any suggestions, comments, etc.

在此先感谢您的任何建议、意见等。

采纳答案by James Adams

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

以下是我提出的似乎效果很好的方法(但如果有办法改进,请发表评论):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()

回答by alldayremix

Use Cursor.prepare()and Cursor.executemany().

使用Cursor.prepare()Cursor.executemany()

From the cx_Oracle documentation:

cx_Oracle 文档

Cursor.prepare(statement[, tag])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

Cursor.executemany(statement, parameters)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

Cursor.prepare(语句[,标签])

这可以在调用 execute() 之前使用来定义将要执行的语句。完成此操作后,当使用 None 或与语句相同的字符串对象调用 execute() 时,将不会执行准备阶段。[...]

Cursor.executemany语句参数

准备针对数据库执行的语句,然后针对在序列参数中找到的所有参数映射或序列执行该语句。语句的管理方式与 execute() 方法的管理方式相同。

Thus, using the above two functions, your code becomes:

因此,使用上述两个函数,您的代码变为:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

Also see Oracle's Mastering Oracle+Pythonseries of articles.

另请参阅 Oracle 的Mastering Oracle+Python系列文章。

回答by Derrick

I would create a large SQL insert statement using union:

我将使用联合创建一个大型 SQL 插入语句:

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

You can build the string in python and give it to oracle as one statement to execute.

您可以在 python 中构建字符串并将其作为一个语句提供给 oracle 来执行。

回答by ragerdl

As one of the comments says, consider using INSERT ALL. Supposedly it'll be significantly faster than using executemany().

正如其中一条评论所说,请考虑使用INSERT ALL. 据说它会比使用executemany().

For example:

例如:

INSERT ALL
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;

http://www.techonthenet.com/oracle/questions/insert_rows.php

http://www.techonthenet.com/oracle/questions/insert_rows.php

回答by zhihuifan

fyi my test result:

仅供参考我的测试结果:

I insert into 5000 rows. 3 columns per row.

我插入 5000 行。每行 3 列。

  1. run insert 5000 times, it costs 1.24 minutes.
  2. run with executemany, it costs 0.125 seconds.
  3. run with a insert all code: it costs 4.08 minutes.
  1. 运行 insert 5000 次,花费 1.24 分钟。
  2. 与 executemany 一起运行,花费 0.125 秒。
  3. 使用插入所有代码运行:花费 4.08 分钟。

python code, which setup the sql like insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

python代码,它设置sql,比如将所有内容插入t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul ...

The python code to setup this long sql, it cost 0.145329 seconds.

设置这个长 sql 的 python 代码,它花费 0.145329 秒。

I test my code on a very old sun machine. cpu: 1415 MH.

我在一台非常旧的 sun 机器上测试我的代码。中央处理器:1415 MH。

in the third case, I checked the database side, the wait event is "SQL*Net more data from client". which means the server is waiting for more data from client.

在第三种情况下,我检查了数据库端,等待事件是“SQL*Net more data from client”。这意味着服务器正在等待来自客户端的更多数据。

The result of the third method is unbelievable for me without the test.

第三种方法的结果在没有测试的情况下对我来说是难以置信的。

so the short suggestion from me is just to use executemany.

所以我的简短建议就是使用 executemany。