从 Oracle DB 中的数百万行中进行选择的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/284382/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to select out of millions of rows in an Oracle DB
提问by chris
G'day!
日!
I have one million different words which I'd like to query for in a table with 15 million rows. The result of synonyms together with the word is getting processed after each query.
我有 100 万个不同的单词,我想在一个有 1500 万行的表中查询它们。每次查询后都会处理同义词和单词的结果。
table looks like this:
表看起来像这样:
synonym word
---------------------
ancient old
anile old
centenarian old
darkened old
distant far
remote far
calm gentle
quite gentle
This is how it is done in Java currently:
这是目前在Java中完成的方式:
....
PreparedStatement stmt;
ResultSet wordList;
ResultSet syns;
...
stmt = conn.prepareStatement("select distinct word from table");
wordList = stmt.executeQuery();
while (wordList.next()) {
stmt = conn.prepareStatement("select synonym from table where word=?");
stmt.setString(1, wordList.getString(1));
syns = stmt.executeQuery();
process(syns, wordList.getString(1));
}
...
This is incredible slow. What's the fastest way to do stuff like this?
这是令人难以置信的缓慢。做这样的事情最快的方法是什么?
Cheers, Chris
干杯,克里斯
回答by JeeBee
Ensure that there is an index on the 'word' column.
Move the second prepareStatement outside the word loop. Each time you create a new statement, the database compiles and optimizes the query - but in this case the query is the same, so this is unnecessary.
Combine the statements as sblundyabove has done.
确保“单词”列上有索引。
将第二个 prepareStatement 移到单词循环之外。每次创建新语句时,数据库都会编译并优化查询 - 但在这种情况下,查询是相同的,因此这是不必要的。
结合上面sblundy所做的陈述。
回答by sblundy
Two ideas:
两个想法:
a) How about making it one query:
a) 如何使它成为一个查询:
select synonym from table where word in (select distinct word from table)
b) Or, if you process
method needs to deal with them as a set of synonyms of one word, why not sort them by word
and start process
anew each time word
is different? That query would be:
b) 或者,如果您的process
方法需要将它们作为一个词的一组同义词来处理,为什么不按每次不同的方式对它们进行排序word
并process
重新开始word
?该查询将是:
select word, synonym
from table
order by word
回答by configurator
Why are you querying the synonyms inside the loop if you're querying all of them anyway? You should use a single select word, synonym from table order by word
, and then split by words in the Java code.
如果您要查询所有同义词,为什么还要查询循环内的同义词?您应该使用单个select word, synonym from table order by word
,然后在 Java 代码中按单词拆分。
回答by JosephStyons
PreparedStatement stmt;
ResultSet syns;
...
stmt = conn.prepareStatement("select distinct " +
" sy.synonm " +
"from " +
" table sy " +
" table wd " +
"where sy.word = wd.word");
syns = stmt.executeQuery();
process(syns);
回答by John Gardner
related but unrelated:
相关但不相关:
while (wordList.next()) {
stmt = conn.prepareStatement("select synonym from table where word=?");
stmt.setString(1, wordList.getString(1));
syns = stmt.executeQuery();
process(syns, wordList.getString(1));
}
You should move that preparestatement call outside the loop:
您应该将该 preparestatement 调用移到循环之外:
stmt = conn.prepareStatement("select synonym from table where word=?");
while (wordList.next()) {
stmt.setString(1, wordList.getString(1));
syns = stmt.executeQuery();
process(syns, wordList.getString(1));
}
The whole point of preparing a statement is for the db to compile/cache/etc because you're going to use the statement repeatedly. You also may need to clean up your result sets explicitly if you're going to do that many queries, to ensure that you don't run out of cursors.
准备语句的全部目的是让数据库编译/缓存/等,因为您将重复使用该语句。如果您要执行那么多查询,您可能还需要明确地清理结果集,以确保不会用完游标。
回答by user55904
You should also consider utilizing the statement object's setFetchSize method to reduce the context switches between your application and the database. If you know you are going to process a million records, you should use setFetchSize(someRelativelyHighNumberLike1000). This tells java to grab up to 1000 records each time it needs more from Oracle [instead of grabbing them one at a time, which is a worst-case-scenario for this kind of batch processing operation]. This will improve the speed of your program. You should also consider refactoring and doing batch processing of your word/synonyms, as
您还应该考虑使用语句对象的 setFetchSize 方法来减少应用程序和数据库之间的上下文切换。如果您知道要处理一百万条记录,则应该使用 setFetchSize(someRelativeHighNumberLike1000)。这告诉 java 每次需要从 Oracle 获取更多记录时最多抓取 1000 条记录[而不是一次抓取一条记录,这是这种批处理操作的最坏情况]。这将提高您的程序的速度。您还应该考虑重构和批量处理您的单词/同义词,如
- fetch 1
- process 1
- repeat
- 取 1
- 过程1
- 重复
is slower than
比
- fetch 50/100/1000
- process 50/100/1000
- repeat
- 获取 50/100/1000
- 过程 50/100/1000
- 重复
just hold the 50/100/1000 [or however many you retrieve at once] in some array structure until you process them.
只需将 50/100/1000 [或您一次检索多少] 保存在某个数组结构中,直到您处理它们。
回答by chris
The problem is solved. The important point is, that the table can be sorted by word. Therefore, I can easily iterate through the whole table. Like this:
问题已经解决了。重要的一点是,表格可以按单词排序。因此,我可以轻松地遍历整个表。像这样:
....
Statement stmt;
ResultSet rs;
String currentWord;
HashSet<String> syns = new HashSet<String>();
...
stmt = conn.createStatement();
rs = stmt.executeQuery(select word, synonym from table order by word);
rs.next();
currentWord = rs.getString(1);
syns.add(rs.getString(2));
while (rs.next()) {
if (rs.getString(1) != currentWord) {
process(syns, currentWord);
syns.clear();
currentWord = rs.getString(1);
}
syns.add(rs.getString(2));
}
...