database 数据库与平面文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2356851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
database vs. flat files
提问by hyperboreean
The company I work for is trying to switch a product that uses flat file format to a database format. We're handling pretty big files of data (ie: 25GB/file) and they get updated really quick. We need to run queries that randomly access the data, as well as in a contiguous way. I am trying to convince them of the advantages of using a database, but some of my colleagues seem reluctant to this. So I was wondering if you guys can help me out here with some reasons or links to posts of why we should use databases, or at least clarify why flat files are better (if they are).
我工作的公司正在尝试将使用平面文件格式的产品转换为数据库格式。我们正在处理相当大的数据文件(即:25GB/文件),并且它们的更新速度非常快。我们需要运行随机访问数据的查询,以及以连续的方式。我试图让他们相信使用数据库的好处,但我的一些同事似乎不愿意这样做。所以我想知道你们是否可以在这里帮助我一些原因或链接到我们为什么应该使用数据库的帖子,或者至少澄清为什么平面文件更好(如果是的话)。
回答by Andrey
- Databases can handle querying tasks, so you don't have to walk over files manually. Databases can handle very complicated queries.
- Databases can handle indexing tasks, so if tasks like get record with id = x can be VERY fast
- Databases can handle multiprocess/multithreaded access.
- Databases can handle access from network
- Databases can watch for data integrity
- Databases can update data easily (see 1) )
- Databases are reliable
- Databases can handle transactions and concurrent access
- Databases + ORMs let you manipulate data in very programmer friendly way.
- 数据库可以处理查询任务,因此您不必手动遍历文件。数据库可以处理非常复杂的查询。
- 数据库可以处理索引任务,所以如果像使用 id = x 获取记录这样的任务可以非常快
- 数据库可以处理多进程/多线程访问。
- 数据库可以处理来自网络的访问
- 数据库可以监视数据完整性
- 数据库可以轻松更新数据(参见 1))
- 数据库可靠
- 数据库可以处理事务和并发访问
- 数据库 + ORM 让您以非常程序员友好的方式操作数据。
回答by Esteban Küber
This is an answer I've already givensome time ago:
It depends entirely on the domain-specific application needs. A lot of times direct text file/binary files access can be extremely fast, efficient, as well as providing you all the file access capabilities of your OS's file system.
Furthermore, your programming language most likely already has a built-in module (or is easy to make one) for specific parsing.
If what you need is many appends (INSERTS?) and sequential/few access little/no concurrency, files are the way to go.
On the other hand, when your requirements for concurrency, non-sequential reading/writing, atomicity, atomic permissions, your data is relational by the nature etc., you will be better off with a relational or OO database.
There is a lot that can be accomplished with SQLite3, which is extremely light (under 300kb), ACID compliant, written in C/C++, and highly ubiquitous (if it isn't already included in your programming language -for example Python-, there is surely one available). It can be useful even on db files as big as 140 terabytes, or 128 tebibytes (Link to Database Size), possible more.
If your requirements where bigger, there wouldn't even be a discussion, go for a full-blown RDBMS.
这完全取决于特定领域的应用程序需求。很多时候,直接文本文件/二进制文件访问可以非常快速、高效,并为您提供操作系统文件系统的所有文件访问功能。
此外,您的编程语言很可能已经具有用于特定解析的内置模块(或易于制作)。
如果您需要的是许多附加(插入?)和顺序/很少访问很少/没有并发,文件是要走的路。
另一方面,当您对并发性、非顺序读/写、原子性、原子权限、数据本质上是关系型等要求时,使用关系型或 OO 数据库会更好。
SQLite3可以完成很多事情,它非常轻巧(小于 300kb),符合 ACID,用 C/C++ 编写,并且非常普遍(如果它尚未包含在您的编程语言中 - 例如 Python-,肯定有一个可用的)。它甚至可以用于 140 TB 或 128 TB(链接到数据库大小)的db 文件,甚至更多。
如果您的要求更大,甚至不会进行讨论,请选择成熟的 RDBMS。
As you say in a comment that "the system" is merely a bunch of scripts, then you should take a look at pgbash.
正如您在评论中所说的“系统”只是一堆脚本,那么您应该看看pgbash。
回答by George Mastros
Don't build it if you can buy it.
如果你能买到它,就不要建造它。
I heard this quote recently, and it really seems fitting as a guide line. Ask yourself this... How much time was spent working on the file handling portion of your app? I suspect a fair amount of time was spent optimizing this code for performance. If you had been using a relational database all along, you would have spent considerably less time handling this portion of your application. You would have had more time for the true "business" aspect of your app.
我最近听到这句话,它似乎真的很适合作为指导方针。问问你自己......在你的应用程序的文件处理部分上花费了多少时间?我怀疑花费了大量时间来优化此代码以提高性能。如果您一直在使用关系数据库,那么处理应用程序这部分的时间就会少得多。您将有更多时间用于应用程序的真正“业务”方面。
回答by Dean J
They're faster; unless you're loading the entire flat file into memory, a database will allow faster access in almost all cases.
它们更快;除非您将整个平面文件加载到内存中,否则在几乎所有情况下,数据库都可以实现更快的访问。
They're safer; databases are easier to safely backup; they have mechanisms to check for file corruption, which flat files do not. Once corruption in your flat file migrates to your backups, you're done, and you might not even know it yet.
它们更安全;数据库更容易安全备份;他们有检查文件损坏的机制,而平面文件则没有。一旦您的平面文件中的损坏迁移到您的备份,您就大功告成了,您甚至可能还不知道。
They have more features; databases can allow many users to read/write at the same time.
他们有更多的特点;数据库可以允许多个用户同时读/写。
They're much less complex to work with, once they're setup.
一旦设置好,它们的使用就不那么复杂了。
回答by Scott Root
Databasesall the way.
数据库一路。
However, if you still have a need for storing files, don't have the capacity to take on a new RDBMS (like Oracle, SQLServer, etc), than look into XML.
但是,如果您仍然需要存储文件,则没有能力承担新的 RDBMS(如 Oracle、SQLServer 等),而不是研究 XML。
XML is a structure file format which offers you the ability to store things as a file but give you query power over the file and data within it. XML Files are easier to read than flat files and can be easily transformed applying an XSLT for even better human-readability. XML is also a great way to transport data around if you must.
XML 是一种结构文件格式,它使您能够将事物存储为文件,但使您能够对文件和其中的数据进行查询。XML 文件比平面文件更易于阅读,并且可以应用 XSLT 轻松转换,以获得更好的人类可读性。如果必须,XML 也是传输数据的好方法。
I strongly suggest a DB, but if you can't go that route, XML is an ok second.
我强烈建议使用 DB,但如果您不能走那条路,XML 也可以。
回答by Victor
What about a non-relational (NoSQL) database such as Amazon's SimpleDB, Tokio Cabinet, etc? I've heard that Google, Facebook, LinkedIn are using these to store their huge datasets.
非关系 (NoSQL) 数据库(例如 Amazon 的 SimpleDB、Tokio Cabinet 等)怎么样?我听说谷歌、Facebook、LinkedIn 正在使用这些来存储他们庞大的数据集。
Can you tell us if your data is structured, if your schema is fixed, if you need easy replicability, if access times are important, etc?
您能否告诉我们您的数据是否结构化、架构是否固定、是否需要易于复制、访问时间是否重要等?
回答by bcosca
What types of files is not mentioned. If they're media files, go ahead with flat files. You probably just need a DB for tags and some way to associate the "external BLOBs" to the records in the DB. But if full text search is something you need, there's no other way to go but migrate to a full DB.
没有提到什么类型的文件。如果它们是媒体文件,请继续使用平面文件。您可能只需要一个用于标记的数据库和某种将“外部 BLOB”关联到数据库中的记录的方法。但是,如果您需要全文搜索,那么除了迁移到完整数据库之外别无他法。
Another thing, your filesystem might provide the ceiling as far as number of physical files are concerned.
另一件事,就物理文件的数量而言,您的文件系统可能会提供上限。
回答by Lay González
Unless you are loading the files into memory each time you boot, use a database. Simple as that.
除非每次启动时都将文件加载到内存中,否则请使用数据库。就那么简单。
That is assuming that your colleges already have the program to handle queries to the files. If not, then use a database.
那是假设您的大学已经拥有处理文件查询的程序。如果没有,则使用数据库。
回答by Oded
SQL ad hoc query abilities are enough of a reason for me. With a good schema and indexing on the tables, this is fast and effective and will have good performance.
SQL 即席查询能力对我来说已经足够了。使用良好的模式和表索引,这将快速有效并且具有良好的性能。
回答by rashedcs
Difference between database and flat files are given below:
数据库和平面文件之间的区别如下:
Database provide more flexibility whereas flat file provide less flexibility.
Database system provide data consistency whereas flat file can not provide data consistency.
- Database is more secure over flat files.
Database support DML and DDL whereas flat files can not support these.
Less data redundancy in database whereas more data redundancy in flat files.
数据库提供了更多的灵活性,而平面文件提供了较少的灵活性。
数据库系统提供数据一致性,而平面文件不能提供数据一致性。
- 数据库比平面文件更安全。
数据库支持 DML 和 DDL,而平面文件不支持这些。
数据库中的数据冗余较少,而平面文件中的数据冗余较多。