java 优化批量插入,SQLite
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12095176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimizing batch inserts, SQLite
提问by James Raitsev
I am playing with different buffer sizes to be inserted into the local SQLite DB and have found that it takes nearly 8 minutes to inserts 10,000,000 rows of data, when buffer size is 10,000. In other words, it takes 1,000 writes to store everything.
我正在尝试将不同的缓冲区大小插入到本地 SQLite 数据库中,发现当缓冲区大小为 10,000 时,插入 10,000,000 行数据需要近 8 分钟。换句话说,需要 1,000 次写入才能存储所有内容。
8 minutes to store 10,000,000 seems a bit too long (or is it?)
8 分钟存储 10,000,000 似乎有点太长了(或者是吗?)
Can any of the below be optimized to increase the speed? Please note that data being inserted is a random collection of characters.
可以优化以下任何一项以提高速度吗?请注意,插入的数据是随机的字符集合。
public int flush() throws SQLException {
String sql = "insert into datastore values(?,?,?,?);";
PreparedStatement prep = con.prepareStatement(sql);
for (DatastoreElement e : content) { // content is 10,000 elements long
_KVPair kvp = e.getKvp();
prep.setInt(1, e.getMetaHash());
prep.setInt(2, kvp.hashCode());
prep.setString(3, kvp.getKey());
prep.setString(4, kvp.getValue());
prep.addBatch();
}
int[] updateCounts = prep.executeBatch();
con.commit();
return errorsWhileInserting(updateCounts);
}
When table is created it is done via
创建表时,它是通过
statement.executeUpdate("create table datastore
(meta_hash INTEGER," +
"kv_hash INTEGER," +
"key TEXT," +
"value TEXT);");
Can any of the above be further optimized please?
请问上面的任何一个可以进一步优化吗?
回答by Joe
I'm a bit hazy on the Java API but I think you should start a transaction first, otherwise calling commit()
is pointless. Do it with conn.setAutoCommit(false)
. Otherwise SQLite will be journaling for each individual insert / update. Which requires syncing the file, which will contribute towards slowness.
我对 Java API 有点模糊,但我认为你应该先开始一个事务,否则调用commit()
是没有意义的。用conn.setAutoCommit(false)
. 否则 SQLite 将为每个单独的插入/更新记录日志。这需要同步文件,这将导致缓慢。
EDIT: The questioner updated to say that this is already set true. In that case:
编辑:提问者更新说这已经被设置为真。在这种情况下:
That is a lot of data. That length of time doesn't sound out of this world. The best you can do is to do tests with different buffer sizes. There is a balance between buffer jitter from them being too small and virtual memory kicking in for large sizes. For this reason, you shouldn't try to put it all into one buffer at once. Split up the inserts into your own batches.
那是很多数据。这么长的时间听起来并不奇怪。您能做的最好的事情是使用不同的缓冲区大小进行测试。它们之间的缓冲区抖动太小和虚拟内存在大尺寸之间存在平衡。出于这个原因,您不应该尝试一次将其全部放入一个缓冲区中。将插入物分成您自己的批次。
回答by keiki
You are only executing executeBatch
once, which means that all 10 million statements are send to the database in the executeBatch
call. This is way too much to handle for a database.
You should additionally execute int[] updateCounts = prep.executeBatch();
in your loop let's say all 1000 rows. Just make an if statement which tests on counter % 1000 == 0
. Then the database can asynchronously already work on the data you sent.
您只执行executeBatch
一次,这意味着所有 1000 万条语句都在executeBatch
调用中发送到数据库。这对于数据库来说太多了。您还应该int[] updateCounts = prep.executeBatch();
在循环中执行,假设所有 1000 行。只需做一个 if 语句来测试counter % 1000 == 0
. 然后数据库可以异步处理您发送的数据。