PostgreSQL 如何比 SQLite 执行写得更快?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19135340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:08:17  来源:igfitidea点击:

How does PostgreSQL perform writes so much faster than SQLite?

performancesqlitepostgresqlclojure

提问by alice

I did a simple integer update performance test. SQLite did only 15 updates per second while PostgreSQL did 1500 updates per second.

我做了一个简单的整数更新性能测试。SQLite 每秒只更新 15 次,而 PostgreSQL 每秒更新 1500 次。

The number with the SQLite case seems to be normal.

SQLite 案例的数字似乎是正常的

The FAQin the SQLite site explains as if it's a fundamental limitation of a rotational disk.

SQLite 站点中的常见问题解答解释为好像它是旋转磁盘的基本限制。

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second. Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe. For details, read about atomic commit in SQLite..

By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.

Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.

实际上,SQLite 在普通台式计算机上每秒可以轻松执行 50,000 或更多 INSERT 语句。但它每秒只能进行几十次交易。事务速度受磁盘驱动器旋转速度的限制。一个事务通常需要磁盘盘片完整旋转两次,而在 7200RPM 磁盘驱动器上,每秒最多只能处理 60 个事务。事务速度受磁盘驱动器速度的限制,因为(默认情况下)SQLite 实际上会等到数据真正安全地存储在磁盘表面后才完成事务。这样,如果您突然断电或操作系统崩溃,您的数据仍然是安全的。有关详细信息,请阅读 SQLite 中的原子提交。

默认情况下,每个 INSERT 语句都是它自己的事务。但是,如果您使用 BEGIN...COMMIT 将多个 INSERT 语句括起来,那么所有插入都将组合到一个事务中。提交事务所需的时间在所有包含的插入语句中分摊,因此每个插入语句的时间大大减少。

另一种选择是运行 PRAGMA synchronous=OFF。该命令将导致 SQLite 不等待数据到达磁盘表面,这将使写入操作看起来要快得多。但是,如果您在交易过程中断电,您的数据库文件可能会损坏。

Is this description true? Then, how can PostgreSQL perform so much faster than SQLite? (I set both fsyncand synchronous_commitoption to onin PostgreSQL)

这个描述是真的吗?那么,PostgreSQL 怎么能比 SQLite 执行得快这么多呢?(我在 PostgreSQL 中都设置了fsyncsynchronous_commit选项on

UPDATE:

更新:

Here's the full test code written in Clojure:

这是用 Clojure 编写的完整测试代码:

(defproject foo "0.1.0-SNAPSHOT"
  :repositories {"sonatype-oss-public" "https://oss.sonatype.org/content/groups/public/"}
  :dependencies [[org.clojure/clojure "1.5.1"]
                 [org.clojure/java.jdbc "0.3.0-SNAPSHOT"]
                 [com.mchange/c3p0 "0.9.2.1"]
                 [org.xerial/sqlite-jdbc "3.7.2"]
                 [postgresql "9.1-901.jdbc4"]])
(ns foo.core
  (:require [clojure.java.jdbc :as jdbc]
            [clojure.java.jdbc.ddl :as ddl])
  (:import  [com.mchange.v2.c3p0 ComboPooledDataSource]))

(def sqlite
  (let [spec {:classname "org.sqlite.JDBC"
              :subprotocol "sqlite"
              :subname "test.db"}]
    {:datasource (doto (ComboPooledDataSource.)
                   (.setDriverClass (:classname spec))
                   (.setJdbcUrl (str "jdbc:" (:subprotocol spec) ":" (:subname spec)))
                   (.setMaxIdleTimeExcessConnections (* 30 60))
                   (.setMaxIdleTime (* 3 60 60)))}))

(def postgres
  (let [spec {:classname "org.postgresql.Driver"
              :subprotocol "postgresql"
              :subname "//localhost:5432/testdb"
              :user "postgres"
              :password "uiop"}]
    {:datasource (doto (ComboPooledDataSource.)
                   (.setDriverClass (:classname spec))
                   (.setJdbcUrl (str "jdbc:" (:subprotocol spec) ":" (:subname spec)))
                   (.setUser (:user spec))
                   (.setPassword (:password spec))
                   (.setMaxIdleTimeExcessConnections (* 30 60))
                   (.setMaxIdleTime (* 3 60 60)))}))

(doseq [x [sqlite postgres]]
  (jdbc/db-do-commands x
    (ddl/create-table :foo [:id :int "PRIMARY KEY"] [:bar :int])))

(doseq [x [sqlite postgres]]
  (jdbc/insert! x :foo {:id 1 :bar 1}))

(defmacro bench
  [expr n]
  `(dotimes [_# 3]
     (let [start# (. System (nanoTime))]
       (dotimes [_# ~n]
         ~expr)
       (let [end#               (. System (nanoTime))
             elapsed#           (/ (double (- end# start#)) 1000000.0)
             operation-per-sec# (long (/ (double ~n) (/ (double (- end# start#)) 1000000000)))]
       (prn (str "Elapsed time: " elapsed# " ms (" (format "%,d" operation-per-sec#) " ops)"))))))

(bench (jdbc/query sqlite ["select * from foo"]) 20000)
(bench (jdbc/execute! sqlite ["update foo set bar=bar+1 where id=?" 1]) 100)

(bench (jdbc/query postgres ["select * from foo"]) 20000)
(bench (jdbc/execute! postgres ["update foo set bar=bar+1 where id=?" 1]) 5000)

And the output is:

输出是:

; Running "select * from foo" 20000 times in SQLite

"Elapsed time: 1802.426963 ms (11,096 ops)"
"Elapsed time: 1731.118831 ms (11,553 ops)"
"Elapsed time: 1749.842658 ms (11,429 ops)"

; Running "update foo set bar=bar+1 where id=1" 100 times in SQLite

"Elapsed time: 6362.829057 ms (15 ops)"
"Elapsed time: 6405.25075 ms (15 ops)"
"Elapsed time: 6352.943553 ms (15 ops)"

; Running "select * from foo" 20000 times in PostgreSQL

"Elapsed time: 2898.636079 ms (6,899 ops)"
"Elapsed time: 2824.77372 ms (7,080 ops)"
"Elapsed time: 2837.622659 ms (7,048 ops)"

; Running "update foo set bar=bar+1 where id=1" 5000 times in PostgreSQL

"Elapsed time: 3213.120219 ms (1,556 ops)"
"Elapsed time: 3564.249492 ms (1,402 ops)"
"Elapsed time: 3280.128708 ms (1,524 ops)"

pg_fsync_test result:

pg_fsync_test 结果:

C:\temp>"C:\Program Files\PostgreSQL.3\bin\pg_test_fsync"
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                   81199.920 ops/sec      12 usecs/op
        fdatasync                                     n/a
        fsync                              45.337 ops/sec   22057 usecs/op
        fsync_writethrough                 46.470 ops/sec   21519 usecs/op
        open_sync                                     n/a

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                   41093.981 ops/sec      24 usecs/op
        fdatasync                                     n/a
        fsync                              38.569 ops/sec   25927 usecs/op
        fsync_writethrough                 36.970 ops/sec   27049 usecs/op
        open_sync                                     n/a

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write                     n/a
         2 *  8kB open_sync writes                    n/a
         4 *  4kB open_sync writes                    n/a
         8 *  2kB open_sync writes                    n/a
        16 *  1kB open_sync writes                    n/a

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close                45.564 ops/sec   21947 usecs/op
        write, close, fsync                33.373 ops/sec   29964 usecs/op

Non-Sync'ed 8kB writes:
        write                             889.800 ops/sec    1124 usecs/op

采纳答案by jjanes

You are correct to be suspicious. PostgreSQL with the settings you indicate should not be able to perform anything near 1500 updates in separate sequential transactions per second to rotating media.

你的怀疑是正确的。具有您指示的设置的 PostgreSQL 不应能够在每秒对旋转媒体的单独顺序事务中执行接近 1500 次更新。

Probably something in your IO stack is lying or buggy about how it implements sync. Which means your data is at risk of serious corruption after an unexpected power outage or OS failure.

可能你的 IO 堆栈中的某些东西在它如何实现同步方面撒谎或有问题。这意味着您的数据在意外断电或操作系统故障后面临严重损坏的风险。

Looking at the result of pg_test_fsync, this is indeed the case. open_datasync, which is the default under Windows, appears to be unrealistically fast and so must be unsafe. I see the same thing when I run pg_test_fsync on my Windows7 machine.

看pg_test_fsync的结果,确实是这样。open_datasync 是 Windows 下的默认值,它的速度似乎不切实际,因此肯定是不安全的。当我在 Windows7 机器上运行 pg_test_fsync 时,我看到了同样的事情。

回答by Denis de Bernardy

It breaks down to how they implement snapshot isolation.

它分解为他们如何实现快照隔离。

SQLite uses file locking as a means to isolate transactions, allowing writes to hit only once all reads are done.

SQLite 使用文件锁定作为隔离事务的一种手段,仅在所有读取完成后才允许写入。

Postgres, in contrast, uses a more sophisticated approach called multiconcurrency version control (mvcc), that allows multiple writes to occur in parallel with multiple reads.

相比之下,Postgres 使用一种称为多并发版本控制 (mvcc) 的更复杂的方法,它允许多次写入与多次读取并行发生。

http://www.sqliteconcepts.org/SI_index.html

http://www.sqliteconcepts.org/SI_index.html

http://www.postgresql.org/docs/current/static/mvcc-intro.html

http://www.postgresql.org/docs/current/static/mvcc-intro.html

http://wiki.postgresql.org/wiki/MVCC

http://wiki.postgresql.org/wiki/MVCC

回答by Dariusz

Denis' answer has all the links you need. I will go for a less detailed but possbly more understandable answer.

丹尼斯的回答包含您需要的所有链接。我将寻求一个不太详细但可能更容易理解的答案。

Sqlite does not use any sophisticated transaction manager, there is not advanced multitasking logic hidden in it. It executes what you tell it to execute, in exactly that order. In other words: it does exactly what you tell it to do. If you tried using the same database from two processes - you would encounter problems.

Sqlite 没有使用任何复杂的事务管理器,也没有隐藏在其中的高级多任务逻辑。它执行您告诉它执行的内容,完全按照该顺序执行。换句话说:它完全按照你的指示去做。如果您尝试在两个进程中使用相同的数据库 - 您会遇到问题。

PostgreSQL, on the other hand, is a very complex database: it efficiently supports multiple concurrent reads and writes. Think of it as a asynchronous system - you only schedule work to be done, you do not actually control it in its details - Postgres does it for you.

另一方面,PostgreSQL 是一个非常复杂的数据库:它有效地支持多个并发读写。把它想象成一个异步系统——你只安排要完成的工作,你实际上并没有控制它的细节——Postgres 为你做。

What to do with your efficiency?Join several - dozens - hundreds of updates/inserts into one transaction. For a simple table, you will get a very good performance.

你的效率怎么办?将数个 - 数十个 - 数百个更新/插入合并到一个事务中。对于简单的表,您将获得非常好的性能。

回答by user133536

Actually, any write on a rotating disk is in the order of magnitude of 10 ms (typical number is 8 ms).

实际上,对旋转磁盘的任何写入都在 10 毫秒的数量级(典型数字为 8 毫秒)。

This means a little more than 100 writes per second, IF you are writing the same position in disk, which is a very odd case for a database. See "You don't know Hyman about disks" from the ACM, typically a disk can schedule 10 reads or writes in a single rotation.

这意味着每秒写入略多于 100 次,如果您在磁盘中写入相同的位置,这对于数据库来说是一种非常奇怪的情况。请参阅 ACM 中的“您对磁盘一无所知”,通常一个磁盘可以在一次轮换中安排 10 次读取或写入。

http://queue.acm.org/detail.cfm?id=864058

http://queue.acm.org/detail.cfm?id=864058

So, a database can perform 1,000 writes per second and even more. I've seen apps performing 1,500 transactions per second 10 years ago in desktop computers.

因此,数据库每秒可以执行 1,000 次写入甚至更多。10 年前,我曾看到应用程序在台式计算机上每秒执行 1,500 次事务。

回答by Wolph

Assuming you are using a normal harddisk (i.e. no ssd) you can expect a maximum of 50-100 writes per second. It seems that 15 writes per second is slightly low, but not impossible.

假设您使用的是普通硬盘(即没有 ssd),您可以预期每秒最多 50-100 次写入。每秒 15 次写入似乎略低,但并非不可能。

So if Postgres is doing 1500 updates per second they are either written to some buffer/cache or collapsed into a single update. Without knowing more about the actual test it is difficult to say which is the actual reason but if you were to open a transaction, update a single row 1500 times and commit after that than Postgres should be smart enough to only execute a single "real" write to the disk.

因此,如果 Postgres 每秒进行 1500 次更新,它们要么被写入某个缓冲区/缓存,要么被折叠为单个更新。如果不了解实际测试的更多信息,很难说哪个是实际原因,但是如果您要打开一个事务,更新单行 1500 次并在此之后提交,那么 Postgres 应该足够聪明,只执行一个“真实”写入磁盘。