SQL 从文件或数据库服务器访问数据是否更快?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2147902/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:13:10  来源:igfitidea点击:

Is it faster to access data from files or a database server?

sqlperformancecgiflat-file

提问by Jeremy Gwa

If I have a static database consisting of folders and files, would access and manipulation be faster than SQL server type databases, considering this would be used in a CGI script?

如果我有一个由文件夹和文件组成的静态数据库,考虑到这将用于 CGI 脚本,访问和操作会比 SQL 服务器类型数据库更快吗?

When working with files and folders, what are the tricks to better performance?

处理文件和文件夹时,提高性能的技巧是什么?

回答by HerbN

I'll add to the it depends crowd.

我会添加到它取决于人群。

This is the kind of question that has no generic answer but is heavily dependent on the situation at hand. I even recently moved some data from a SQL database to a flat file system because the overhead of the DB, combined with some DB connection reliability issues, made using flat files a better choice.

这是一种没有通用答案但在很大程度上取决于手头情况的问题。我什至最近将一些数据从 SQL 数据库移到了平面文件系统,因为数据库的开销,加上一些数据库连接可靠性问题,使使用平面文件成为更好的选择。

Some questions I would ask myself when making the choice include:

我在做出选择时会问自己的一些问题包括:

  1. How am I consuming the data? For example will I just be reading from the beginning to the end rows in the order entered? Or will I be searching for rows that match multiple criteria?

  2. How often will I be accessing the data during one program execution? Will I go once to get all books with Salinger as the author or will I go several times to get several different authors? Will I go more than once for several different criteria?

  3. How will I be adding data? Can I just append a row to the end and that's perfect for my retrieval or will it need to be resorted?

  4. How logical will the code look in six months?I emphasize this because I think this is too often forgotten in designing things (not just code, this hobby horse is actually from my days as a Navy mechanic cursing mechanical engineers). In six months when I have to maintain your code (or you do after working another project) which way of storing and retrieving data will make more sense. If going from flat files to a DB results in a 1% efficiency improvement but adds a week of figuring things out when you have to update the code have you really improved things.

  1. 我如何消费数据?例如,我会按照输入的顺序从头到尾阅读吗?或者我将搜索匹配多个条件的行?

  2. 在一个程序执行期间,我多久访问一次数据?我会去一次得到所有以塞林格为作者的书,还是会去好几次去得到几个不同的作者?对于几个不同的标准,我会不止一次去吗?

  3. 我将如何添加数据?我可以只在最后追加一行,这对我的检索来说是完美的,还是需要重新使用?

  4. 代码在六个月后看起来如何合乎逻辑?我强调这一点是因为我认为这在设计事物时经常被遗忘(不仅仅是代码,这匹爱好马实际上是我作为海军机械师诅咒机械工程师的日子)。六个月后,当我必须维护您的代码(或者您在完成另一个项目后维护)时,哪种存储和检索数据的方式更有意义。如果从平面文件转换到数据库会导致 1% 的效率提高,但在您必须更新代码时增加了一周的时间来解决问题,那么您是否真的改进了一些东西。

回答by DVK

Depends on what your information is and what your access patterns and scale are. Two of the biggest benefits of a relational databases are:

取决于您的信息是什么以及您的访问模式和规模是什么。关系数据库的两个最大好处是:

  1. Caching. Unless you're very clever, you can't write a cache as good as that of a DB server

  2. Optimizer.

  1. 缓存。除非你很聪明,否则你不能写出像数据库服务器一样好的缓存

  2. 优化器。

However, for certain specialized applications, neither of these 2 benefits manifest itself compared to files+folders data store - therefore the answer is a resounding "depends".

然而,对于某些专门的应用程序,与文件+文件夹数据存储相比,这两个好处都没有体现出来 - 因此答案是一个响亮的“取决于”。

As for files/folders, the tricks are:

至于文件/文件夹,技巧是:

  • Cache the contents of frequently requested files
  • Have small directories (files in deeply nested small directories are much faster to access than in a flatter structure, due to the time it takes to read the contents of a big directory).
  • There are other, more advanced optimizations (slice across disks, placement on different places in a disk or different partition, etc..) - but if you have need of THAT level, you are better off with a database in the first place.
  • 缓存频繁请求文件的内容
  • 拥有小目录(由于读取大目录的内容需要时间,因此访问深度嵌套的小目录中的文件比扁平结构中的文件要快得多)。
  • 还有其他更高级的优化(跨磁盘切片、放置在磁盘或不同分区的不同位置等) - 但如果您需要那个级别,那么首先最好使用数据库。

回答by Disillusioned

As a general rule, databases are slower than files.

一般来说,数据库比文件慢。

If you require indexing of your files, a hard-coded access path on customised indexing structures will always have the potential to be faster if you do it correctly.

如果您需要为您的文件编制索引,那么如果您正确执行,自定义索引结构上的硬编码访问路径将始终具有更快的潜力。

But 'performance' is not the the goal when choosing a database over a file based solution.

但是在选择数据库而不是基于文件的解决方案时,“性能”并不是目标。

You should ask yourself whether your system needs any of the benefits that a database would provide. If so, then the small performance overhead is quite acceptable.

您应该问问自己,您的系统是否需要数据库提供的任何好处。如果是这样,那么小的性能开销是完全可以接受的。

So:

所以:

  1. Do you need to deal with multiple users and concurrent updates? (Well; you did say it's static.)
  2. Do you need flexibility in order to easily query the data from a variety of angles?
  3. Do you have multiple users, and could gain from making use of an existing security model?
  1. 您是否需要处理多个用户和并发更新?(好吧;你确实说过它是静态的。)
  2. 您是否需要灵活性以便从多个角度轻松查询数据?
  3. 您是否有多个用户,并且可以从使用现有安全模型中获益?

Basically, the question is more of which would be easier to develop. The performance difference between the two is not worth wasting dev time.

基本上,问题更多的是哪个更容易开发。两者之间的性能差异不值得浪费开发时间。

回答by Joey Adams

From my little bit of experience, server-based databases (even those served on the local machine) tend to to have very slow throughput compared to local filesystems. However, this depends on some things, one of which being asymptotic complexity. Comparing scanning a big list of files against using a database with an index to look up an item, the database wins.

根据我的一点经验,与本地文件系统相比,基于服务器的数据库(即使是在本地机器上提供的数据库)的吞吐量往往非常低。然而,这取决于一些事情,其中​​之一是渐近复杂性。将扫描大文件列表与使用带有索引的数据库查找项目进行比较,数据库获胜。

My little bit of experience is with PostgreSQL. I had a table with three million rows, and I went to update a mere 8,000 records. It took 8 seconds.

我的一点经验是使用 PostgreSQL。我有一个包含 300 万行的表,而我只更新了 8,000 条记录。花了 8 秒。

As for the quote "Premature optimization is the root of all evil.", I would take that with a grain of salt. If you write your application using a database, then find it to be slow, it might take a tremendous amount of time to switch to a filesystem-based approach or something else (e.g. SQLite). I would say your best bet is to create a very simple prototype of your workload, and test it with both approaches. I believe it is important to know which is faster in this case.

至于“过早优化是万恶之源”这句话,我持保留态度。如果您使用数据库编写应用程序,然后发现它很慢,那么切换到基于文件系统的方法或其他方法(例如 SQLite)可能需要大量时间。我会说最好的办法是创建一个非常简单的工作负载原型,并使用两种方法对其进行测试。我相信在这种情况下知道哪个更快很重要。

回答by John Hyland

As others have pointed out: it depends!

正如其他人指出的那样:这取决于!

If you reallyneed to find out which is going to be more performant for your purposes, you may want to generate some sample data to store in each format and then run some benchmarks. The Benchmark.pm module comes with Perl, and makes it fairly simple to do a side-by-side comparison with something like this:

如果您真的需要找出哪个对您的目的来说性能更好,您可能需要生成一些示例数据以每种格式存储,然后运行一些基准测试。Benchmark.pm 模块与 Perl 一起提供,它使与以下内容进行并排比较变得相当简单:

use Benchmark qw(:all) ;

my $count = 1000;  # Some large-ish number of trials is recommended.

cmpthese($count, {
    'File System' => sub { ...your filesystem code... },
    'Database'    => sub { ...your database code... }
});

You can type perldoc Benchmarkto get more complete documentation.

您可以键入perldoc Benchmark以获取更完整的文档。

回答by Kuzgun

It is very useful to use files instead of db when it comes to images if site structure is suitable. Create folders representing your matching data and place images inside. For example you have an article site, you store your articles in db. You don't have to place your image paths on db, name folders with your primary keys like 1,2,3.. and put images inside. E-books, music files, videos, this approach can be used in all media files. Same logic works with xml files if you won't search for something.

如果站点结构合适,在图像方面使用文件而不是 db 非常有用。创建代表匹配数据的文件夹并将图像放入其中。例如,您有一个文章站点,您将文章存储在 db 中。你不必把你的图像路径放在 db 上,用你的主键命名文件夹,比如 1,2,3.. 并将图像放在里面。电子书、音乐文件、视频,这种方式可以用于所有媒体文件。如果您不搜索某些内容,则相同的逻辑适用于 xml 文件。

回答by FalseVinylShrub

As others have said, it depends: on the size and nature of the data and the operations you're planning to run on it.

正如其他人所说,这取决于:数据的大小和性质以及您计划在其上运行的操作。

Particularly for a CGI script, you're going to incur a performance hit for connecting to a database server on every page view. However if you create a naive file-based approach, you could easily create worse performance problems ;-)

特别是对于CGI 脚本,您将在每次页面视图中连接到数据库服务器时导致性能下降。但是,如果您创建一种基于文件的简单方法,则很容易造成更糟糕的性能问题;-)

As well as a Berkeley DB File solution you could also consider using SQLite. This creates a SQL interface to a database stored in a local file. You can access it with DBI and SQL but there's no server, configuration or network protocol. This could allow easier migration if a database server is necessary in the future (example: if you decide to have multiple front-end servers, but need to share state).

除了 Berkeley DB File 解决方案,您还可以考虑使用SQLite。这将为存储在本地文件中的数据库创建一个 SQL 接口。您可以使用 DBI 和 SQL 访问它,但没有服务器、配置或网络协议。如果将来需要数据库服务器,这可以允许更轻松的迁移(例如:如果您决定拥有多个前端服务器,但需要共享状态)。

Without knowing any details, I'd suggest using a SQLite/DBI solution then reviewing the performance. This will give flexibility with a reasonably simple start up and decent performance.

在不知道任何细节的情况下,我建议使用 SQLite/DBI 解决方案,然后查看性能。这将通过相当简单的启动和不错的性能提供灵活性。

回答by Nate C-K

It depends on the profile of the data and what logic you are going to be using to access it. If you simply need to save and fetch named nodes then a filesystem-based database may be faster and more efficient. (You could also have a look at Berkeley DB for that purpose.) If you need to do index-based searches, and especially if you need to join different sets of data based on keys, then an SQL database is your best bet.

这取决于数据的配置文件以及您将使用什么逻辑来访问它。如果您只需要保存和获取命名节点,那么基于文件系统的数据库可能会更快、更高效。(为此,您也可以查看 Berkeley DB。)如果您需要进行基于索引的搜索,特别是如果您需要根据键连接不同的数据集,那么 SQL 数据库是您的最佳选择。

I would just go with whatever solution seems the most natural for your application.

我会选择对您的应用程序来说最自然的任何解决方案。

回答by brian d foy

To quickly access files, depending on what you are doing, an mmap can be very handy. I just wrote about this in the Effective Perlblog as Memory-map files instead of slurping them.

要快速访问文件,根据您正在执行的操作,mmap 可能非常方便。我刚刚在Effective Perl博客中将此作为Memory-map files 而不是 slurping them 进行了描述

However, I expect that a database server would be much faster. It's difficult to say what would be faster for you when we have no idea what you are doing, what sort of data you need to access, and so on.

但是,我希望数据库服务器会快得多。当我们不知道你在做什么、你需要访问什么样的数据等等时,很难说什么对你来说更快。

回答by Doron Segal

I'm going to give you the same answer everyone else gave you, It Depends

我会给你和其他人给你的一样的答案, 这取决于

In a simple scenario with a single server that returns data (READ Only), Yes file system will be great and easy to manage.

在单个服务器返回数据(只读)的简单场景中,Yes 文件系统将非常好且易于管理。

But, when you have more than one server you'll have to manage distributed files system like glusterfs, ceph, etc..

但是,当您拥有不止一台服务器时,您将不得不管理分布式文件系统,如glusterfsceph等。

A database is a tool to manage all of it for you, distributed files system, compression, read/write, locks etc..

数据库是一种为您管理所有数据的工具,包括分布式文件系统、压缩、读/写、锁等。

hope that's helpful.

希望这有帮助。