Oracle CLOB 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1525780/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 19:08:57  来源:igfitidea点击:

Oracle CLOB performance

performanceoraclespringjdbclob

提问by yawn

I am running queries against an Oracle 10g with JDBC (using the latest drivers and UCP as DataSource) in order to retrieve CLOBs (avg. 20k characters). However the performance seems to be pretty bad: the batch retrieval of 100 LOBs takes 4s in average. The operation is also neither I/O nor CPU nor network bound judging from my observations.

我正在使用 JDBC(使用最新的驱动程序和 UCP 作为数据源)对 Oracle 10g 运行查询,以便检索 CLOB(平均 20k 个字符)。但是性能似乎很差:100 个 LOB 的批量检索平均需要 4 秒。从我的观察来看,该操作也既不是 I/O 也不是 CPU 也不是网络限制。

My test setup looks like this:

我的测试设置如下所示:

PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
dataSource.setConnectionFactoryClassName("...");
dataSource.setConnectionPoolName("...");
dataSource.setURL("...");
dataSource.setUser("...");
dataSource.setPassword("...");

dataSource.setConnectionProperty("defaultRowPrefetch", "1000");
dataSource.setConnectionProperty("defaultLobPrefetchSize", "500000");

final LobHandler handler = new OracleLobHandler();
JdbcTemplate j = new JdbcTemplate(dataSource);

j.query("SELECT bigClob FROM ...",

        new RowCallbackHandler() {

            public void processRow(final ResultSet rs) throws SQLException {

                String result = handler.getClobAsString(rs, "bigClob");

            }

        });

}

I experimented with the fetch sizes but to no avail. Am I doing something wrong? Is there a way to speed up CLOB retrieval when using JDBC?

我尝试了获取大小,但无济于事。难道我做错了什么?有没有办法在使用 JDBC 时加快 CLOB 检索?

采纳答案by yawn

Thanks for all the helpful suggestions. Despite being flagged as answer to the problem my answer is that there seems to be no good solution. I tried using parallel statements, different storage characteristics, presorted temp. tables and other things. The operation seems not to be bound to any characteristic visible through traces or explain plans. Even query parallelism seems to be sketchy when CLOBs are involved.

感谢所有有用的建议。尽管被标记为问题的答案,但我的答案是似乎没有好的解决方案。我尝试使用并行语句、不同的存储特性、预先排序的温度。桌子和其他东西。该操作似乎不受任何通过痕迹或解释计划可见的特征的约束。当涉及到 CLOB 时,甚至查询并行性似乎也很粗略。

Undoubtedly there would be better options to deal with with large CLOBs (especially compression) in an 11g environment but atm. I am stuck with 10g.

毫无疑问,在 11g 环境中处理大型 CLOB(尤其是压缩)会有更好的选择,但 atm。我被 10g 困住了。

I have opted now for an additional roundtrip to the database in which I'll preprocess the CLOBs into a size optimized binary RAW. In previous deployments this has always been a very fast option and will likely be worth the trouble of maintaining an offline computed cache. The cache will be invalided and update using a persistent process and AQ until someone comes up with a better idea.

我现在选择了对数据库的额外往返,我将在其中将 CLOB 预处理为大小优化的二进制 RAW。在以前的部署中,这一直是一个非常快速的选择,并且可能值得维护离线计算缓存的麻烦。缓存将失效并使用持久进程和 AQ 进行更新,直到有人提出更好的主意。

回答by Stephen ODonnell

The total size of the result set is in the ten thousands - measured over the span of the whole retrieval the initial costs

结果集的总大小以万计 - 在整个检索的跨度上衡量初始成本

Is there an Order By in the query? 10K rows is quite a lot if it has to be sorted.

查询中是否有 Order By?如果必须排序,10K 行是相当多的。

Also, retrieving the PK is not a fair test versus retrieving the entire CLOB. Oracle stores the table rows with probably many in a block, but each of the CLOBs (if they are > 4K) will be stored out of line, each in a series of blocks. Scanning the list of PK's is therefore going to be fast. Also, there is probably an index on the PK, so Oracle can just quickly scan the index blocks and not even access the table.

此外,与检索整个 CLOB 相比,检索 PK 并不是一个公平的测试。Oracle 在一个块中存储可能有很多行的表行,但是每个 CLOB(如果它们大于 4K)将被存储在一系列的块中。因此,扫描 PK 列表会很快。此外,PK 上可能有一个索引,因此 Oracle 可以快速扫描索引块,甚至不访问表。

4 seconds does seem a little high, but it is 2MB that needs to be possible read from disk and transported over the network to your Java program. Network could be an issue. If you perform an SQL trace of the session it will point you at exactly where the time is being spent (disk reads or network).

4 秒看起来确实有点高,但是从磁盘读取并通过网络传输到您的 Java 程序需要 2MB。网络可能是一个问题。如果您对会话执行 SQL 跟踪,它将准确地指向您花费的时间(磁盘读取或网络)。

回答by Oscar Chan

My past experience of using oracle LOB type data to store large data has not been good. It is fine when it is under 4k since it store it locally like varchar2. Once it is over 4k, you start seeing performance degrade. Perhaps, things may have improved since I last tried it a couple of years ago, but here are the things I found in the past for your information:

我过去使用oracle LOB类型数据存储大数据的经验并不好。当它低于 4k 时很好,因为它像 varchar2 一样在本地存储它。一旦超过 4k,您就会开始看到性能下降。也许,自几年前我上次尝试以来,情况可能有所改善,但以下是我过去发现的信息供您参考:

As clients need to get LOBs via oracle server, you may consider the following interesting situation.

由于客户端需要通过 oracle 服务器获取 LOB,您可以考虑以下有趣的情况。

  • lob data will compete limited SGA cache with other data type if oracle decide to cache it. As clob data are general big, so it may push other data
  • lob data get poor disk read if oracle decide not to cache it, and stream the data to the client.
  • fragmentation is probably something that you haven't encountered yet. You will see if your applications delete lobs, and oracle tries to reuse the lob. I don't know if oracle support online defragmenting the disk for lob (they have for indexes, but it takes long time when we tried it previous).
  • 如果 oracle 决定缓存它,lob 数据将与其他数据类型竞争有限的 SGA 缓存。由于clob数据一般比较大,所以可能会推送其他数据
  • 如果 oracle 决定不缓存数据,lob 数据将无法读取磁盘,并将数据流式传输到客户端。
  • 碎片化可能是您还没有遇到过的。您将看到您的应用程序是否删除了 lob,并且 oracle 会尝试重用 lob。我不知道 oracle 是否支持对 lob 的磁盘进行在线碎片整理(它们有用于索引,但是我们之前尝试过它需要很长时间)。

You mentioned 4s for 100 lobs of avg 20k, so it's 40ms per lobs. Remember each lob needs to have to retrieved via separate Lob locater (it is not in the result set by default). That is an additional round trip for each lob, I assume (I am not 100% sure on this since it was a while ago) If that is the case, I assume that will be at least 5ms extra time per round trip in serial order, right? If so, your performance is already first limited by sequential lob fetches. You should be able to verify this by tracking the time spent in sql execution vs lob content fetching. Or you can verify this by excluding the lob column as suggested by the previous answer in the post, which should tell you if it is lob related.

你提到了 4s 表示 100 lob 的 avg 20k,所以每个 lob 是 40ms。请记住,每个 lob 都需要通过单独的 Lob 定位器进行检索(默认情况下它不在结果集中)。这是每个 lob 的额外往返,我假设(我不是 100% 确定,因为它是前一段时间)如果是这种情况,我假设每次往返至少有 5ms 的额外时间以串行顺序, 对?如果是这样,您的性能首先已经受到连续 lob 提取的限制。您应该能够通过跟踪 sql 执行与 lob 内容获取所花费的时间来验证这一点。或者您可以通过排除帖子中上一个答案所建议的 lob 列来验证这一点,这应该告诉您它是否与 lob 相关。

Good luck

祝你好运

回答by Robin Morris

I had a similar issue and found the JDBC Lobs making a network call when accessin the lobs.

我有一个类似的问题,发现 JDBC Lobs 在访问 lob 时进行网络调用。

As of Oracle 11.2g JDBC Driver you can use a prefetch. This speeded up access by 10 times...

从 Oracle 11.2g JDBC Driver 开始,您可以使用预取。这将访问速度提高了 10 倍......

statement1.setFetchSize(1000);
if (statement1 instanceof OracleStatement) {
    ((OracleStatement) statement1).setLobPrefetchSize(250000);
}