java 优化批量插入,SQLite

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12095176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 07:34:31  来源:igfitidea点击:

Optimizing batch inserts, SQLite

javasqlite

提问by James Raitsev

I am playing with different buffer sizes to be inserted into the local SQLite DB and have found that it takes nearly 8 minutes to inserts 10,000,000 rows of data, when buffer size is 10,000. In other words, it takes 1,000 writes to store everything.

我正在尝试将不同的缓冲区大小插入到本地 SQLite 数据库中,发现当缓冲区大小为 10,000 时,插入 10,000,000 行数据需要近 8 分钟。换句话说,需要 1,000 次写入才能存储所有内容。

8 minutes to store 10,000,000 seems a bit too long (or is it?)

8 分钟存储 10,000,000 似乎有点太长了(或者是吗?)

Can any of the below be optimized to increase the speed? Please note that data being inserted is a random collection of characters.

可以优化以下任何一项以提高速度吗?请注意,插入的数据是随机的字符集合。

public int flush() throws SQLException {
    String sql = "insert into datastore values(?,?,?,?);";

    PreparedStatement prep = con.prepareStatement(sql);

    for (DatastoreElement e : content) { // content is 10,000 elements long
        _KVPair kvp = e.getKvp();

        prep.setInt(1, e.getMetaHash());
        prep.setInt(2, kvp.hashCode());
        prep.setString(3, kvp.getKey());
        prep.setString(4, kvp.getValue());

        prep.addBatch();
    }

    int[] updateCounts = prep.executeBatch();

    con.commit();

    return errorsWhileInserting(updateCounts);
}

When table is created it is done via

创建表时,它是通过

    statement.executeUpdate("create table datastore 
               (meta_hash INTEGER," + 
               "kv_hash   INTEGER," + 
               "key TEXT," +
               "value TEXT);");

Can any of the above be further optimized please?

请问上面的任何一个可以进一步优化吗?

回答by Joe

I'm a bit hazy on the Java API but I think you should start a transaction first, otherwise calling commit()is pointless. Do it with conn.setAutoCommit(false). Otherwise SQLite will be journaling for each individual insert / update. Which requires syncing the file, which will contribute towards slowness.

我对 Java API 有点模糊,但我认为你应该先开始一个事务,否则调用commit()是没有意义的。用conn.setAutoCommit(false). 否则 SQLite 将为每个单独的插入/更新记录日志。这需要同步文件,这将导致缓慢。

EDIT: The questioner updated to say that this is already set true. In that case:

编辑:提问者更新说这已经被设置为真。在这种情况下:

That is a lot of data. That length of time doesn't sound out of this world. The best you can do is to do tests with different buffer sizes. There is a balance between buffer jitter from them being too small and virtual memory kicking in for large sizes. For this reason, you shouldn't try to put it all into one buffer at once. Split up the inserts into your own batches.

那是很多数据。这么长的时间听起来并不奇怪。您能做的最好的事情是使用不同的缓冲区大小进行测试。它们之间的缓冲区抖动太小和虚拟内存在大尺寸之间存在平衡。出于这个原因,您不应该尝试一次将其全部放入一个缓冲区中。将插入物分成您自己的批次。

回答by keiki

You are only executing executeBatchonce, which means that all 10 million statements are send to the database in the executeBatchcall. This is way too much to handle for a database. You should additionally execute int[] updateCounts = prep.executeBatch();in your loop let's say all 1000 rows. Just make an if statement which tests on counter % 1000 == 0. Then the database can asynchronously already work on the data you sent.

您只执行executeBatch一次,这意味着所有 1000 万条语句都在executeBatch调用中发送到数据库。这对于数据库来说太多了。您还应该int[] updateCounts = prep.executeBatch();在循环中执行,假设所有 1000 行。只需做一个 if 语句来测试counter % 1000 == 0. 然后数据库可以异步处理您发送的数据。