C++ sqlite3中更快的批量插入?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/364017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:57:15  来源:igfitidea点击:

Faster bulk inserts in sqlite3?

c++sqliteinsertbulk

提问by scubabbl

I have a file of about 30000 lines of data that I want to load into a sqlite3 database. Is there a faster way than generating insert statements for each line of data?

我有一个包含大约 30000 行数据的文件,我想将其加载到 sqlite3 数据库中。有没有比为每一行数据生成插入语句更快的方法?

The data is space-delimited and maps directly to an sqlite3 table. Is there any sort of bulk insert method for adding volume data to a database?

数据以空格分隔并直接映射到 sqlite3 表。是否有任何类型的批量插入方法可以将卷数据添加到数据库?

Has anyone devised some deviously wonderful way of doing this if it's not built in?

如果它不是内置的,有没有人设计出一些巧妙的方法来做到这一点?

I should preface this by asking, is there a C++ way to do it from the API?

我应该先问一下,是否有 C++ 方法可以从 API 中做到这一点?

采纳答案by ramanujan

You can also try tweaking a few parametersto get extra speed out of it. Specifically you probably want PRAGMA synchronous = OFF;.

您还可以尝试调整一些参数以获得额外的速度。具体来说,您可能想要PRAGMA synchronous = OFF;.

回答by Javier

  • wrap all INSERTs in a transaction, even if there's a single user, it's far faster.
  • use prepared statements.
  • 将所有 INSERT 包装在一个事务中,即使只有一个用户,速度也快得多。
  • 使用准备好的语句。

回答by ramanujan

You want to use the .importcommand. For example:

您要使用该.import命令。例如:

$ cat demotab.txt
44      92
35      94
43      94
195     49
66      28
135     93
135     91
67      84
135     94

$ echo "create table mytable (col1 int, col2 int);" | sqlite3 foo.sqlite
$ echo ".import demotab.txt mytable"  | sqlite3 foo.sqlite

$ sqlite3 foo.sqlite
-- Loading resources from /Users/ramanujan/.sqliterc
SQLite version 3.6.6.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from mytable;
col1    col2
44      92
35      94
43      94
195     49
66      28
135     93
135     91
67      84
135     94

Note that this bulk loading command is not SQL but rather a custom feature of SQLite. As such it has a weird syntax because we're passing it via echoto the interactive command line interpreter, sqlite3.

请注意,此批量加载命令不是 SQL,而是 SQLite 的自定义功能。因此它有一个奇怪的语法,因为我们将它传递echo给交互式命令行解释器,sqlite3.

In PostgreSQL the equivalent is COPY FROM: http://www.postgresql.org/docs/8.1/static/sql-copy.html

在 PostgreSQL 中,相当于COPY FROMhttp: //www.postgresql.org/docs/8.1/static/sql-copy.html

In MySQL it is LOAD DATA LOCAL INFILE: http://dev.mysql.com/doc/refman/5.1/en/load-data.html

在 MySQL 中它是LOAD DATA LOCAL INFILEhttp: //dev.mysql.com/doc/refman/5.1/en/load-data.html

One last thing: remember to be careful with the value of .separator. That is a very common gotcha when doing bulk inserts.

最后一件事:记住要小心.separator. 在进行批量插入时,这是一个非常常见的问题。

sqlite> .show .separator
     echo: off
  explain: off
  headers: on
     mode: list
nullvalue: ""
   output: stdout
separator: "\t"
    width:

You should explicitly set the separator to be a space, tab, or comma before doing .import.

在执行.import.

回答by paxos1977

  • Increase PRAGMA default_cache_sizeto a much larger number. This will increase the number of pages cached in memory.

  • Wrap all inserts into a single transaction rather than one transaction per row.

  • Use compiled SQL statements to do the inserts.
  • Finally, as already mentioned, if you are willing forgo full ACID compliance, set PRAGMA synchronous = OFF;.
  • 提高PRAGMA default_cache_size到一个更大的数字。这将增加内存中缓存的页面数。

  • 将所有插入包装到单个事务中,而不是每行一个事务。

  • 使用已编译的 SQL 语句进行插入。
  • 最后,如前所述,如果您愿意放弃完全符合 ACID 的要求,请设置PRAGMA synchronous = OFF;.

回答by Hannes de Jager

RE: "Is there a faster way that generating insert statements for each line of data?"

RE:“有没有更快的方法来为每一行数据生成插入语句?”

First: Cut it down to 2 SQL statements by making use of Sqlite3's Virtual table APIe.g.

第一:利用Sqlite3的Virtual table API将其减少到2条SQL语句,例如

create virtual table vtYourDataset using yourModule;
-- Bulk insert
insert into yourTargetTable (x, y, z)
select x, y, z from vtYourDataset;

The idea here is that you implement a C interface that reads your source data set and present it to SQlite as a virtual table and then you do a SQL copy from the source to the target table in one go. It sounds harder than it really is and I've measured huge speed improvements this way.

这里的想法是,您实现一个 C 接口,该接口读取您的源数据集并将其作为虚拟表呈现给 SQlite,然后一次性从源表执行 SQL 复制到目标表。这听起来比实际更难,我已经通过这种方式测量了巨大的速度改进。

Second: Make use of the other advise provided here i.e. the pragma settings and making use of a transaction.

第二:利用这里提供的其他建议,即编译指示设置和使用事务。

Third: Perhaps see if you can do away with some of the indexes on the target table. That way sqlite will have less indexes to update for each row inserted

第三:也许看看你是否可以去掉目标表上的一些索引。这样sqlite将为插入的每一行更新更少的索引

回答by scott

There is no way to bulk insert, but there is a way to write large chunks to memory, then commit them to the database. For the C/C++ API, just do:

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);

...(INSERT statements)

sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);

没有办法批量插入,但有办法将大块写入内存,然后将它们提交到数据库。对于 C/C++ API,只需执行以下操作:

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);

...(插入语句)

sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);

Assuming db is your database pointer.

假设 db 是您的数据库指针。

回答by Flavien Volken

A good compromise is to wrap your INSERTS between BEGIN; and END; keyword i.e:

一个很好的折衷方法是将您的 INSERTS 包裹在 BEGIN; 之间。和结束;关键字即:

BEGIN;
INSERT INTO table VALUES ();
INSERT INTO table VALUES ();
...
END;

回答by astef

I've tested some pragmasproposed in the answers here:

我已经测试了这里的答案中提出的一些编译指示

  • synchronous = OFF
  • journal_mode = WAL
  • journal_mode = OFF
  • locking_mode = EXCLUSIVE
  • synchronous = OFF+ locking_mode = EXCLUSIVE+ journal_mode = OFF
  • synchronous = OFF
  • journal_mode = WAL
  • journal_mode = OFF
  • locking_mode = EXCLUSIVE
  • synchronous = OFF+ locking_mode = EXCLUSIVE+journal_mode = OFF

Here's my numbers for different number of inserts in a transaction:

这是我在事务中插入不同数量的数字:

Increasing the batch size can give you a real performance boost, while turning off journal, synchronization, acquiring exclusive lock will give an insignificant gain. Points around ~110k show how random background load can affect your database performance.

增加批量大小可以给你带来真正的性能提升,而关闭日志、同步、获取独占锁将带来微不足道的收益。大约 110k 的点显示随机后台负载如何影响您的数据库性能。

Also, it worth to mention, that journal_mode=WALis a good alternative to defaults. It gives some gain, but do not reduce reliability.

此外,值得一提的是,这journal_mode=WAL是默认值的一个很好的替代方案。它提供了一些增益,但不会降低可靠性。

C# Code.

C# 代码。

回答by pestophagous

Depending on the size of the data and the amount of RAM available, one of the best performance gains will occur by setting sqlite to use an all-in-memory database rather than writing to disk.

根据数据的大小和可用的 RAM 量,最好的性能提升之一是将 sqlite 设置为使用全内存数据库而不是写入磁盘。

For in-memory databases, pass NULL as the filename argument to sqlite3_openand make sure that TEMP_STORE is defined appropriately

对于内存数据库,将 NULL 作为文件名参数传递给sqlite3_open确保 TEMP_STORE 被正确定义

(All of the above text is excerpted from my own answer to a separate sqlite-related question)

(以上所有文字均摘自我自己对一个单独的sqlite相关问题的回答

回答by maazza

I found this to be a good mix for an one shot long import.

我发现这是一次性导入的好组合。

.echo ON

.read create_table_without_pk.sql

PRAGMA cache_size = 400000; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF; PRAGMA locking_mode = EXCLUSIVE; PRAGMA count_changes = OFF; PRAGMA temp_store = MEMORY; PRAGMA auto_vacuum = NONE;

.separator "\t" .import a_tab_seprated_table.txt mytable

BEGIN; .read add_indexes.sql COMMIT;

.exit

source: http://erictheturtle.blogspot.be/2009/05/fastest-bulk-import-into-sqlite.html

来源:http: //erictheturtle.blogspot.be/2009/05/fastest-bulk-import-into-sqlite.html

some additional info: http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/

一些附加信息:http: //blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/