在 PostgreSQL 中存储文件是否存在性能问题?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9605922/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Are there performance issues storing files in PostgreSQL?
提问by Renato Dinhani
Is ok storing files like HTML pages, images, PDF, etc in a table in PostgreSQL or it is slow? I read some articles saying that this is not recommended, but I don't know if is true.
可以将 HTML 页面、图像、PDF 等文件存储在 PostgreSQL 的表中还是很慢?我读过一些文章说不推荐这样做,但我不知道是否属实。
The column types I have in mind are BLOB
(as far as I know it stores in a file) or bytea
type, but others are applicable also.
我想到的列类型是BLOB
(据我所知它存储在文件中)或bytea
类型,但其他的也适用。
回答by Daniel Lyons
You have basically two choices. You can store the data right in the row or you can use the large object facility. Since PostgreSQL now uses something called TOASTto move large fields out of the table there should be no performance penalty associated with storing large data in the row directly. There remains a 1 GB limit in the size of a field. If this is too limited or if you want a streaming API, you can use the large object facility, which gives you something more like file descriptors in the database. You store the LO ID in your column and can read and write from that ID.
你基本上有两个选择。您可以将数据直接存储在行中,也可以使用大对象工具。由于 PostgreSQL 现在使用称为TOAST 的东西将大字段移出表,因此与直接在行中存储大数据相关的性能损失应该没有。一个字段的大小仍然有 1 GB 的限制。如果这太有限,或者如果你想要一个流 API,你可以使用大对象工具,它为你提供更像数据库中的文件描述符的东西。您将 LO ID 存储在您的列中,并且可以从该 ID 读取和写入。
I personally would suggest you avoid the large object facility unless you absolutely need it. With TOAST, most use cases are covered by just using the database the way you'd expect. With large objects, you give yourself additional maintenance burden, because you have to keep track of the LO IDs you've used and be sure to unlink them when they're not used anymore (but not before) or they'll sit in your data directory taking up space forever. There are also a lot of facilities that have exceptional behavior around them, the details of which escape me because I never use them.
我个人建议您避免使用大型对象工具,除非您绝对需要它。使用 TOAST,只需按照您期望的方式使用数据库即可涵盖大多数用例。对于大型对象,您会给自己带来额外的维护负担,因为您必须跟踪已使用的 LO ID,并确保在不再使用(但之前不再使用)时取消链接,否则它们会留在您的数据目录永远占用空间。还有很多设施在它们周围有特殊的行为,其中的细节我不知道,因为我从不使用它们。
For most people, the big performance penalty associated with storing large data in the database is that your ORM software will pull out the big data on every query unless you specifically instruct it not to. You should take care to tell Hibernate or whatever you're using to treat these columns as large and only fetch them when they're specifically requested.
对于大多数人来说,与在数据库中存储大数据相关的巨大性能损失是,您的 ORM 软件将在每个查询中提取大数据,除非您特别指示不要这样做。您应该注意告诉 Hibernate 或您使用的任何将这些列视为大的列,并且仅在它们被特别请求时才获取它们。
回答by singularsyntax
The BLOB (LO) type stores data in 2KB chunks within standard PostgreSQL heap pages which default to 8KB in size. They are not stored as independent, cohesive files in the file system - for example, you wouldn't be able to locate the file, do a byte-by-byte comparison and expect it to be the same as the original file data that you loaded into the database, since there's also Postgres heap page headers and structures which delineate the chunks.
BLOB (LO) 类型将数据存储在标准 PostgreSQL 堆页面中的 2KB 块中,默认大小为 8KB。它们不是作为独立的、有凝聚力的文件存储在文件系统中的——例如,您将无法定位文件,进行逐字节比较并期望它与您获取的原始文件数据相同加载到数据库中,因为还有描述块的 Postgres 堆页面标题和结构。
You should avoid using the Large Object (LO) interface if your application would need to frequently update the binary data, and particularly if that involved a lot of small, random-access writes, which due to the way PostgreSQL implements concurrency control (MVCC) can lead to an explosion in the amount of disk space used until you VACUUM the database. The same outcome is probably also applicable to data stored inline in a column with the byteatype or even TOAST'd.
如果您的应用程序需要频繁更新二进制数据,您应该避免使用大对象 (LO) 接口,特别是如果这涉及大量小的随机访问写入,这是由于 PostgreSQL 实现并发控制 (MVCC) 的方式可能会导致使用的磁盘空间量激增,直到您对数据库进行 VACUUM 为止。相同的结果可能也适用于内联存储在具有bytea类型甚至 TOAST 类型的列中的数据。
However, if your data follows a Write-Once-Read-Many pattern (e.g. upload a PNG image and never modify it afterwards), it should be fine from the standpoint of disk usage.
但是,如果您的数据遵循一次写入多次读取的模式(例如上传一个 PNG 图像并且之后不再修改它),那么从磁盘使用的角度来看应该没问题。
See this pgsql-general mailing list threadfor further discussion.
请参阅此 pgsql-general 邮件列表线程以进行进一步讨论。