如何遍历 Oracle 表中的所有行？

Question

提问by JimDaniel

I have a table with ~30,000,000 rows that I need to iterate through, manipulate the data for each row individually, then save the data from the row to file on a local drive.

我有一个包含约 30,000,000 行的表，我需要遍历它，分别操作每一行的数据，然后将行中的数据保存到本地驱动器上的文件中。

What is the most efficient way to loop through all the rows in the table using SQL for Oracle? I've been googling but can see no straightforward way of doing this. Please help. Keep in mind I do not know the exact number of rows, only an estimate.

使用 SQL for Oracle 循环遍历表中所有行的最有效方法是什么？我一直在谷歌搜索，但看不到这样做的直接方法。请帮忙。请记住，我不知道确切的行数，只是估计值。

EDIT FOR CLARIFICATION:

编辑澄清：

We are using Oracle 10g I believe. The row data contains blob data (zipped text files and xml files) that will be read into memory and loaded into a custom object, where it will then be updated/converted using .Net DOM access classes, rezipped, and stored onto a local drive.

我相信我们正在使用 Oracle 10g。行数据包含 blob 数据（压缩文本文件和 xml 文件），这些数据将被读入内存并加载到自定义对象中，然后使用 .Net DOM 访问类对其进行更新/转换、重新压缩并存储到本地驱动器上.

I do not have much database experience whatsoever - I planned to use straight SQL statements with ADO.Net + OracleCommands. No performance restrictions really. This is for internal use. I just want to do it the best way possible.

我没有太多的数据库经验 - 我计划在 ADO.Net + OracleCommands 中使用直接的 SQL 语句。真的没有性能限制。这是供内部使用的。我只想以最好的方式做到这一点。

Answer 1

回答by Cade Roux

You need to read 30m rows from an Oracle DB and write out 30m files from the BLOBs (one zipped XML/text file in one BLOB column per row?) in each row to the file system on the local computer?

您需要从 Oracle DB 中读取 30m 行并将 30m 文件从 BLOB（每行一个 BLOB 列中的一个压缩 XML/文本文件？）写出到本地计算机上的文件系统？

The obvious solution is open a ADO.NET DataReader on SELECT * FROM tbl WHERE <range>so you can do batches. Read the BLOB from the reader into your API, do your stuff and write out the file. I would probably try to write the program so that it can run from many computers, each doing their own ranges - your bottleneck is most likely going to be the unzipping, manipulation and the rezipping, since many consumers can probably stream data from that table from the server without noticeable effect on server performance.

显而易见的解决方案是打开 ADO.NET DataReader，SELECT * FROM tbl WHERE <range>以便您可以进行批处理。将读取器中的 BLOB 读入您的 API，执行您的操作并写出文件。我可能会尝试编写程序，以便它可以在多台计算机上运行，每台计算机都有自己的范围 - 您的瓶颈很可能是解压缩、操作和重新压缩，因为许多消费者可能可以从该表中流式传输数据服务器对服务器性能没有明显影响。

I doubt you'll be able to do this with set-based operations internal to the Oracle database, and I would also be thinking about the file system and how you are going to organize so many files (and whether you have space - remember the size taken up by a file on a the file system is always an even multiple of the file system block size).

我怀疑您是否能够使用 Oracle 数据库内部的基于集合的操作来做到这一点，而且我还会考虑文件系统以及您将如何组织这么多文件（以及您是否有空间 - 请记住文件系统上的文件占用的大小始终是文件系统块大小的偶数倍）。

Answer 2

回答by Jon Quarfoth

You could try using rownum queries to grab chunks until you grab chunk that doesn't exist.

您可以尝试使用 rownum 查询来抓取块，直到抓取不存在的块。

This is a good article on rownum queries: http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html

这是一篇关于 rownum 查询的好文章：http: //www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html

If you don't feel like reading, jump directly to the "Pagination with ROWNUM" section at the end for an example query.

如果您不想阅读，请直接跳到末尾的“Pagination with ROWNUM”部分以获取示例查询。

Answer 3

回答by JimDaniel

My initial solution was to do something like this, as I have access to an id number (pseudocode):

我最初的解决方案是做这样的事情，因为我可以访问一个 id 号（伪代码）：

            int num_rows = 100;
            int base = 0;
            int ceiling = num_rows;

            select * from MY_TABLE where id >= base and id < ceiling;
            iterate through retrieved rows, do work,
            base = ceiling; 
            ceiling += num_rows;
            select * from MY_TABLE where id >= base and id < ceiling;
            iterate through retrieved rows, do work,
            ...etc.

But I feel that this might not be the most efficient or best way to do it...

但我觉得这可能不是最有效或最好的方法......

Answer 4

回答by mcauthorn

It sounds like you need the entire dataset beforeyou can do any data manipulation since it is a BLOB>. I would just use a DataAdapter.Fill and then hand the dataset over to the custom object to iterate through, do it's manipulation and then write to disk the end object, and then zip.

这听起来像你需要整个数据集之前，你可以做任何数据操作，因为它是一个BLOB>。我只会使用 DataAdapter.Fill，然后将数据集交给自定义对象进行迭代，进行操作，然后将最终对象写入磁盘，然后压缩。

Answer 5

回答by Jason

It's always preferable to use set-based operations when working with a large number of rows.

在处理大量行时，最好使用基于集合的操作。

You would then enjoy a performance benefit. After processing the data, you should be able to dump the data from the table into a file in one go.

然后，您将享受性能优势。处理完数据后，您应该能够一次性将表中的数据转储到文件中。

The viability of this depends on the processing you need to perform on the rows, although it is possible in most cases to avoid using a loop. Is there some specific requirement which prevents you from processing all rows at once?

其可行性取决于您需要对行执行的处理，尽管在大多数情况下可以避免使用循环。是否有一些特定要求会阻止您一次处理所有行？

If iterating through the rows is unavoidable, using bulk binding can be beneficial: FORALL bulk operations or BULK COLLECT for "select into" queries.

如果遍历行是不可避免的，使用批量绑定可能是有益的：FORALL 批量操作或 BULK COLLECT 用于“select into”查询。

如何遍历 Oracle 表中的所有行？

提问by JimDaniel

回答by Cade Roux

回答by Jon Quarfoth

回答by JimDaniel

回答by mcauthorn

回答by Jason

相关推荐

最近更新

标签

如何遍历 Oracle 表中的所有行？

提问by JimDaniel

回答by Cade Roux

回答by Jon Quarfoth

回答by JimDaniel

回答by mcauthorn

回答by Jason

相关推荐

oracle PL/SQL 是否具有与 Java 等效的 StringTokenizer？

oracle 如何在oracle中将系统时间附加到文件名？

Oracle CLOB 性能

禁用 Oracle SQL Developer“运行”快捷方式

相关推荐

最近更新

标签