从 Oracle DB 中的数百万行中进行选择的最佳方法

Question

提问by chris

G'day!

日！

I have one million different words which I'd like to query for in a table with 15 million rows. The result of synonyms together with the word is getting processed after each query.

我有 100 万个不同的单词，我想在一个有 1500 万行的表中查询它们。每次查询后都会处理同义词和单词的结果。

table looks like this:

表看起来像这样：

    synonym      word
    ---------------------
    ancient      old
    anile        old
    centenarian  old
    darkened     old
    distant      far
    remote       far
    calm         gentle
    quite        gentle

This is how it is done in Java currently:

这是目前在Java中完成的方式：

....
PreparedStatement stmt;
ResultSet wordList;
ResultSet syns;
...

stmt = conn.prepareStatement("select distinct word from table");
wordList = stmt.executeQuery();

while (wordList.next()) {
    stmt = conn.prepareStatement("select synonym from table where word=?");
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}
...

This is incredible slow. What's the fastest way to do stuff like this?

这是令人难以置信的缓慢。做这样的事情最快的方法是什么？

Cheers, Chris

干杯，克里斯

Answer 1

回答by JeeBee

Ensure that there is an index on the 'word' column.
Move the second prepareStatement outside the word loop. Each time you create a new statement, the database compiles and optimizes the query - but in this case the query is the same, so this is unnecessary.
Combine the statements as sblundyabove has done.

确保“单词”列上有索引。
将第二个 prepareStatement 移到单词循环之外。每次创建新语句时，数据库都会编译并优化查询 - 但在这种情况下，查询是相同的，因此这是不必要的。
结合上面sblundy所做的陈述。

Answer 2

回答by sblundy

Two ideas:

两个想法：

a) How about making it one query:

a) 如何使它成为一个查询：

select synonym from table where word in (select distinct word from table)

b) Or, if you processmethod needs to deal with them as a set of synonyms of one word, why not sort them by wordand start processanew each time wordis different? That query would be:

b) 或者，如果您的process方法需要将它们作为一个词的一组同义词来处理，为什么不按每次不同的方式对它们进行排序word并process重新开始word？该查询将是：

select word, synonym 
from table 
order by word

Answer 3

回答by configurator

Why are you querying the synonyms inside the loop if you're querying all of them anyway? You should use a single select word, synonym from table order by word, and then split by words in the Java code.

如果您要查询所有同义词，为什么还要查询循环内的同义词？您应该使用单个select word, synonym from table order by word，然后在 Java 代码中按单词拆分。

Answer 4

回答by JosephStyons

PreparedStatement stmt;
ResultSet syns;
...

stmt = conn.prepareStatement("select distinct " + 
                             "  sy.synonm " + 
                             "from " +
                             "  table sy " +
                             "  table wd " +
                             "where sy.word = wd.word");
syns = stmt.executeQuery();
process(syns);

Answer 5

回答by John Gardner

related but unrelated:

回答by user55904

You should also consider utilizing the statement object's setFetchSize method to reduce the context switches between your application and the database. If you know you are going to process a million records, you should use setFetchSize(someRelativelyHighNumberLike1000). This tells java to grab up to 1000 records each time it needs more from Oracle [instead of grabbing them one at a time, which is a worst-case-scenario for this kind of batch processing operation]. This will improve the speed of your program. You should also consider refactoring and doing batch processing of your word/synonyms, as

您还应该考虑使用语句对象的 setFetchSize 方法来减少应用程序和数据库之间的上下文切换。如果您知道要处理一百万条记录，则应该使用 setFetchSize(someRelativeHighNumberLike1000)。这告诉 java 每次需要从 Oracle 获取更多记录时最多抓取 1000 条记录[而不是一次抓取一条记录，这是这种批处理操作的最坏情况]。这将提高您的程序的速度。您还应该考虑重构和批量处理您的单词/同义词，如

fetch 1
process 1
repeat

取 1
过程1
重复

is slower than

比

fetch 50/100/1000
process 50/100/1000
repeat

获取 50/100/1000
过程 50/100/1000
重复

just hold the 50/100/1000 [or however many you retrieve at once] in some array structure until you process them.

只需将 50/100/1000 [或您一次检索多少] 保存在某个数组结构中，直到您处理它们。

Answer 7

回答by chris

The problem is solved. The important point is, that the table can be sorted by word. Therefore, I can easily iterate through the whole table. Like this:

问题已经解决了。重要的一点是，表格可以按单词排序。因此，我可以轻松地遍历整个表。像这样：

....
Statement stmt;
ResultSet rs;
String currentWord;
HashSet<String> syns = new HashSet<String>();
...

stmt = conn.createStatement();
rs = stmt.executeQuery(select word, synonym from table order by word);

rs.next();
currentWord = rs.getString(1);
syns.add(rs.getString(2));

while (rs.next()) {
    if (rs.getString(1) != currentWord) {
        process(syns, currentWord);
        syns.clear();
        currentWord = rs.getString(1);
    }
    syns.add(rs.getString(2));
}
...

从 Oracle DB 中的数百万行中进行选择的最佳方法

提问by chris

回答by JeeBee

回答by sblundy

回答by configurator

回答by JosephStyons

回答by John Gardner

回答by user55904

回答by chris

相关推荐

最近更新

标签

从 Oracle DB 中的数百万行中进行选择的最佳方法

提问by chris

回答by JeeBee

回答by sblundy

回答by configurator

回答by JosephStyons

回答by John Gardner

回答by user55904

回答by chris

相关推荐

Oracle PL/SQL 脚本中的这个斜杠字符是错误的吗？

oracle 如何登录Oracle数据库？

Oracle 10g - UTL_MAIL 包

oracle 在只有选择的事务中提交和回滚有区别吗？

相关推荐

最近更新

标签