database 具有非常大的数据库文件的sqlite的性能特点是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/784173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are the performance characteristics of sqlite with very large database files?
提问by Snazzer
I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).
我知道 sqlite 即使在支持超大数据库文件的情况下也不能很好地执行(曾经在 sqlite 网站上有一条评论指出,如果您需要大于 1GB 的文件大小,您可能需要考虑使用企业 rdbms。可以不再找到它,可能与旧版本的 sqlite 相关)。
However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.
但是,出于我的目的,我想在考虑其他解决方案之前先了解它到底有多糟糕。
I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards. Anyone have any experience with this? Any tips/ideas?
我说的是多 GB 范围内的 sqlite 数据文件,从 2GB 开始。有人对此有经验吗?任何提示/想法?
采纳答案by Snazzer
So I did some tests with sqlite for very large files, and came to some conclusions (at least for my specific application).
所以我用 sqlite 对非常大的文件做了一些测试,并得出了一些结论(至少对于我的特定应用程序)。
The tests involve a single sqlite file with either a single table, or multiple tables. Each table had about 8 columns, almost all integers, and 4 indices.
测试涉及带有单个表或多个表的单个 sqlite 文件。每个表大约有 8 列,几乎都是整数和 4 个索引。
The idea was to insert enough data until sqlite files were about 50GB.
这个想法是插入足够的数据,直到 sqlite 文件大约为 50GB。
Single Table
单桌
I tried to insert multiple rows into a sqlite file with just one table. When the file was about 7GB (sorry I can't be specific about row counts) insertions were taking far too long. I had estimated that my test to insert all my data would take 24 hours or so, but it did not complete even after 48 hours.
我试图将多行插入到只有一张表的 sqlite 文件中。当文件大约为 7GB 时(对不起,我不能具体说明行数)插入花费的时间太长了。我估计我插入所有数据的测试需要 24 小时左右,但即使在 48 小时后它也没有完成。
This leads me to conclude that a single, very large sqlite table will have issues with insertions, and probably other operations as well.
这使我得出结论,单个非常大的 sqlite 表将有插入问题,可能还有其他操作。
I guess this is no surprise, as the table gets larger, inserting and updating all the indices take longer.
我想这并不奇怪,随着表变大,插入和更新所有索引需要更长的时间。
Multiple Tables
多表
I then tried splitting the data by time over several tables, one table per day. The data for the original 1 table was split to ~700 tables.
然后我尝试将数据按时间拆分到几个表中,每天一张表。原始 1 个表的数据被拆分为约 700 个表。
This setup had no problems with the insertion, it did not take longer as time progressed, since a new table was created for every day.
这个设置在插入时没有问题,随着时间的推移它不会花费更长的时间,因为每天都会创建一个新表。
Vacuum Issues
真空问题
As pointed out by i_like_caffeine, the VACUUM command is a problem the larger the sqlite file is. As more inserts/deletes are done, the fragmentation of the file on disk will get worse, so the goal is to periodically VACUUM to optimize the file and recover file space.
正如 i_like_caffeine 所指出的, VACUUM 命令是一个问题,sqlite 文件越大。随着更多的插入/删除完成,磁盘上文件的碎片会变得更糟,因此目标是定期 VACUUM 来优化文件并恢复文件空间。
However, as pointed out by documentation, a full copy of the database is made to do a vacuum, taking a very long time to complete. So, the smaller the database, the faster this operation will finish.
但是,正如文档所指出的,制作数据库的完整副本是为了做真空,需要很长时间才能完成。因此,数据库越小,此操作完成的速度就越快。
Conclusions
结论
For my specific application, I'll probably be splitting out data over several db files, one per day, to get the best of both vacuum performance and insertion/delete speed.
对于我的特定应用程序,我可能会将数据拆分为多个 db 文件,每天一个,以获得最佳的真空性能和插入/删除速度。
This complicates queries, but for me, it's a worthwhile tradeoff to be able to index this much data. An additional advantage is that I can just delete a whole db file to drop a day's worth of data (a common operation for my application).
这使查询复杂化,但对我来说,能够索引这么多数据是值得的权衡。另一个优点是我可以删除整个 db 文件来删除一天的数据(我的应用程序的常见操作)。
I'd probably have to monitor table size per file as well to see when the speed will become a problem.
我可能还必须监视每个文件的表大小,以查看速度何时会成为问题。
It's too bad that there doesn't seem to be an incremental vacuum method other than auto vacuum. I can't use it because my goal for vacuum is to defragment the file (file space isn't a big deal), which auto vacuum does not do. In fact, documentation states it may make fragmentation worse, so I have to resort to periodically doing a full vacuum on the file.
这太糟糕了,似乎没有比其他增量真空方法自动真空。我不能使用它,因为我的真空目标是对文件进行碎片整理(文件空间不是什么大问题),而自动真空不会这样做。事实上,文档说明它可能会使碎片变得更糟,所以我不得不定期对文件进行全面清理。
回答by Alex
We are using DBS of 50 GB+ on our platform. no complains works great. Make sure you are doing everything right! Are you using predefined statements ? *SQLITE 3.7.3
我们在我们的平台上使用 50 GB 以上的 DBS。没有抱怨效果很好。确保你做的一切都是正确的!您是否使用预定义语句?*SQLITE 3.7.3
- Transactions
- Pre made statements
Apply these settings (right after you create the DB)
PRAGMA main.page_size = 4096; PRAGMA main.cache_size=10000; PRAGMA main.locking_mode=EXCLUSIVE; PRAGMA main.synchronous=NORMAL; PRAGMA main.journal_mode=WAL; PRAGMA main.cache_size=5000;
- 交易
- 预先声明
应用这些设置(在您创建数据库之后)
PRAGMA main.page_size = 4096; PRAGMA main.cache_size=10000; PRAGMA main.locking_mode=EXCLUSIVE; PRAGMA main.synchronous=NORMAL; PRAGMA main.journal_mode=WAL; PRAGMA main.cache_size=5000;
Hope this will help others, works great here
希望这会帮助其他人,在这里效果很好
回答by Paul Lefebvre
I've created SQLite databases up to 3.5GB in size with no noticeable performance issues. If I remember correctly, I think SQLite2 might have had some lower limits, but I don't think SQLite3 has any such issues.
我已经创建了最大 3.5GB 的 SQLite 数据库,没有明显的性能问题。如果我没记错的话,我认为 SQLite2 可能有一些较低的限制,但我认为 SQLite3 没有任何此类问题。
According to the SQLite Limitspage, the maximum size of each database page is 32K. And the maximum pages in a database is 1024^3. So by my math that comes out to 32 terabytes as the maximum size. I think you'll hit your file system's limits before hitting SQLite's!
根据SQLite Limits页面,每个数据库页面的最大大小为 32K。数据库中的最大页数为 1024^3。因此,根据我的数学计算,最大大小为 32 TB。我认为在达到 SQLite 之前,您会达到文件系统的限制!
回答by user352992
Much of the reason that it took > 48 hours to do your inserts is because of your indexes. It is incredibly faster to:
进行插入需要超过 48 小时的大部分原因是因为您的索引。以惊人的速度执行以下操作:
1 - Drop all indexes 2 - Do all inserts 3 - Create indexes again
1 - 删除所有索引 2 - 执行所有插入 3 - 再次创建索引
回答by Lester Cheung
Besides the usual recommendation:
除了通常的建议:
- Drop index for bulk insert.
- Batch inserts/updates in large transactions.
- Tune your buffer cache/disable journal /w PRAGMAs.
- Use a 64bit machine (to be able to use lots of cache?).
- [added July 2014] Use common table expression (CTE)instead of running multiple SQL queries! Requires SQLite release 3.8.3.
- 删除批量插入的索引。
- 在大事务中批量插入/更新。
- 调整你的缓冲区缓存/禁用日志 /w PRAGMAs。
- 使用 64 位机器(能够使用大量缓存?)。
- [2014 年 7 月添加] 使用公用表表达式 (CTE)而不是运行多个 SQL 查询!需要 SQLite 3.8.3 版。
I have learnt the following from my experience with SQLite3:
我从 SQLite3 的经验中学到了以下几点:
- For maximum insert speed, don't use schema with any column constraint. (
Alter table later as neededYou can't add constraints with ALTER TABLE). - Optimize your schema to store what you need. Sometimes this means breaking down tables and/or even compressing/transforming your data before inserting to the database. A great example is to storing IP addresses as (long) integers.
- One table per db file - to minimize lock contention. (Use ATTACH DATABASEif you want to have a single connection object.
- SQLite can store different types of data in the same column (dynamic typing), use that to your advantage.
- 为获得最大插入速度,请勿使用具有任何列约束的架构。(
稍后根据需要更改表您不能使用 ALTER TABLE 添加约束)。 - 优化您的架构以存储您需要的内容。有时这意味着在插入数据库之前分解表和/或什至压缩/转换数据。一个很好的例子是将 IP 地址存储为(长)整数。
- 每个 db 文件一张表 - 最大限度地减少锁争用。(如果您想拥有单个连接对象,请使用ATTACH DATABASE。
- SQLite 可以在同一列中存储不同类型的数据(动态类型),充分利用它。
Question/comment welcome. ;-)
欢迎提问/评论。;-)
回答by Unknown
I think the main complaints about sqlite scaling is:
我认为对 sqlite 缩放的主要抱怨是:
- Single process write.
- No mirroring.
- No replication.
- 单进程写。
- 没有镜像。
- 没有复制。
回答by Mike Oxynormas
I have a 7GB SQLite database. To perform a particular query with an inner join takes 2.6s In order to speed this up I tried adding indexes. Depending on which index(es) I added, sometimes the query went down to 0.1s and sometimes it went UP to as much as 7s. I think the problem in my case was that if a column is highly duplicate then adding an index degrades performance :(
我有一个 7GB 的 SQLite 数据库。使用内部连接执行特定查询需要 2.6 秒 为了加快速度,我尝试添加索引。根据我添加的索引,有时查询会下降到 0.1 秒,有时会上升到 7 秒。我认为在我的情况下的问题是,如果一列高度重复,那么添加索引会降低性能:(
回答by Mike Oxynormas
There used to be a statement in the SQLite documentation that the practical size limit of a database file was a few dozen GB:s. That was mostly due to the need for SQLite to "allocate a bitmap of dirty pages" whenever you started a transaction. Thus 256 byte of RAM were required for each MB in the database. Inserting into a 50 GB DB-file would require a hefty (2^8)*(2^10)=2^18=256 MB of RAM.
SQLite 文档中曾经有一个声明,即数据库文件的实际大小限制是几十 GB:s。这主要是因为每当您开始事务时,SQLite 都需要“分配脏页位图”。因此,数据库中的每个 MB 需要 256 字节的 RAM。插入 50 GB 的 DB 文件需要大量 (2^8)*(2^10)=2^18=256 MB 的 RAM。
But as of recent versions of SQLite, this is no longer needed. Read more here.
但是从 SQLite 的最新版本开始,这不再需要了。在这里阅读更多。
回答by eodonohoe
I've experienced problems with large sqlite files when using the vacuum command.
我在使用vacuum 命令时遇到过大sqlite 文件的问题。
I haven't tried the auto_vacuum feature yet. If you expect to be updating and deleting data often then this is worth looking at.
我还没有尝试过 auto_vacuum 功能。如果您希望经常更新和删除数据,那么这值得一看。