以批处理方式运行 Oracle 更新语句

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/860214/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 18:14:34  来源:igfitidea点击:

Run Oracle update statements in a batch mode

sqloracleplsql

提问by Dylan

I need to run a couple of relatively simple SQL update statements to update a single column in an Oracle table with 14.4 million rows. One statement runs a function written in Java and the JVM runs out of memory as I'm doing the update on all 14.4 million rows.

我需要运行几个相对简单的 SQL 更新语句来更新具有 1440 万行的 Oracle 表中的单个列。一个语句运行一个用 Java 编写的函数,JVM 内存不足,因为我正在对所有 1440 万行进行更新。

Have you written a kind of batch PL/SQL routine that can break this simple update into sets of, say, 10K records per batch? I know that if I can commit my updates after a bunch of records, it'll go a lot faster and I won't run out of memory. I'm sure there's a simple way to do this using a FOR loopand row_numbut I'm not making much progress.

您是否编写了一种批处理 PL/SQL 例程,可以将这个简单的更新分解为每批 10K 条记录的集合?我知道如果我能在一堆记录之后提交我的更新,它会变得更快,而且我不会耗尽内存。我确信有一种使用 a 的简单方法可以做到这一点FOR looprow_num但我没有取得太大进展。

Here are the two statements I need to run for each batch of nrecords:

这是我需要为每批n条记录运行的两个语句:

first one:

第一:

 update vr_location l set l.usps_address=(
   select mylib.string_utils.remove_duplicate_whitespace(
   house_number || ' ' || pre_street_direction || ' ' || street_name || ' ' || 
   street_description || ' ' || post_street_direction)
 from vr_address a where a.address_pk=l.address_pk);

second:

第二:

update vr_location set usps_address = mylib.usaddress_utils.parse_address(usps_address);

回答by stili

Do an initial select to retrieve some kind of grouping attribute, so that you end up with groups that have the desired number of rows. Experiment with the grouping clause, for instance the last three digits of a zip-code or something semi random.

执行初始选择以检索某种分组属性,以便最终获得具有所需行数的组。试验分组子句,例如邮政编码的最后三位数字或半随机的东西。

Loop over the grouping clause, using the clause as parameter to limit the rows targeted by each update statement. commit at the end of each iteration.

循环分组子句,使用子句作为参数来限制每个更新语句所针对的行。在每次迭代结束时提交。

回答by dpbradley

You (or your DBA) should size the UNDO properly and do this as a single SQL transaction

您(或您的 DBA)应该正确调整 UNDO 的大小,并将其作为单个 SQL 事务执行

The advantages are:

优点是:

  • read consistency on table while this is happening
  • you retain the ability to rollback the transaction in case something fails
  • 发生这种情况时读取表上的一致性
  • 如果出现故障,您保留回滚事务的能力

If you're in some sort of loading environment where you don't care about either of these, then use CTAS (create table as select) to make a new table with the modified value, build the indexes, constraints, etc and then swap the table names. 14 million rows isn't that big these days.

如果您处于某种您不关心其中任何一个的加载环境中,则使用 CTAS(创建表作为选择)使用修改后的值创建一个新表,构建索引、约束等,然后交换表名。如今,1400 万行并不是那么大。

回答by Dylan

Well, I had to get stuff done so I took your recommendations then did a little Python to do it. I ended up using cx_Oracle to give me good control over the transactions. Obviously PL/SQL would have been better but I don't know it. Python is my new hammer, and everything is a nail!

好吧,我必须把事情做完,所以我接受了你的建议,然后用了一点 Python 来完成它。我最终使用 cx_Oracle 来很好地控制事务。显然 PL/SQL 会更好,但我不知道。Python是我的新锤子,一切都是钉子!

#!/usr/bin/env python
import csv
import time
import cx_Oracle

# Parses USPS addresses from voter addresses
# and inserts them into VR_LOCATION table ready
# for geocoding. Does batches by zipcode
def LoadZips():
    zipcodes = []
    zips = open('OH_ZIP_CODES.txt','r')
    for line in zips:
        zip = line[0:5]
        if zip not in zipcodes:
            zipcodes.append(zip)
    zips.close()
    return zipcodes

def UpdateAddresses(ziplist):
    counter = 1
    total = len(ziplist)

    for zipcode in ziplist:
        orcl = cx_Oracle.connect('voter/voter@oracle')
        curs = orcl.cursor()
        countsql = "select count(*) from vr_location where zip_co = '%s'" % zipcode
        concatsql = """update vr_location l set l.usps_address=(
                    select mizar.string_utils.remove_duplicate_whitespace(
                        house_number
                        ||' '||pre_street_direction
                        ||' '||street_name
                        ||' '||street_description
                        ||' '||post_street_direction)
                    from vr_address a where a.address_pk = l.address_pk)
                where zip_co = '%s'""" % zipcode
        parsesql = """update vr_location set usps_address =  mizar.usaddress_utils.parse_address(usps_address)
                where zip_co = '%s'""" % zipcode
        curs.execute(countsql)

        records_affected = curs.fetchone()[0]
    if records_affected == 0:
        print "No records for zipcode %s" % zipcode
        counter += 1 
        continue

    print "[%s] %s of %s: %s addresses" % (zipcode, counter, total, records_affected)
    curs.execute(concatsql)
    orcl.commit()
    curs.execute(parsesql)
    orcl.commit()
    curs.close()
    counter += 1 

    # Uncomment this to debug - just steps through X zipcodes      
    #if counter == 3:
    #    print "Cleaning up..."
    #    break


if __name__ == "__main__":
    start = time.clock()
    zipcodes = LoadZips()
    print "Processing addresses in %s zip codes" % len(zipcodes)
    UpdateAddresses(zipcodes)

Blog post on the topic

关于该主题的博客文章