java 平面文件数据库好用吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/332825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:00:14  来源:igfitidea点击:

Are flat file databases any good?

javadatabaselinuxarchitecture

提问by Robert Nickens

Informed options needed about the merits of flat file database. I'm considering using a flat file database scheme to manage data for a custom blog. It would be deployed on Linux OS variant and written in Java.

需要有关平面文件数据库优点的知情选项。我正在考虑使用平面文件数据库方案来管理自定义博客的数据。它将部署在 Linux 操作系统变体上并用 Java 编写。

What are the possible negatives or positives regarding performance for reading and writing of both articles and comments?

文章和评论的阅读和写作表现可能存在哪些负面或正面影响?

Would article retrieval crap out because of it being a flat file rather than a RDBMS if it were to get slash-doted? (Wishful thinking)

文章检索是否会因为它是一个平面文件而不是 RDBMS 而失败,如果它被斜杠化了?(妄想)

I'm not against using a RDBMS, just asking the community their opinion on the viability of such a software architecture scheme.

我不反对使用 RDBMS,只是询问社区他们对这种软件架构方案的可行性的看法。

Follow Up:In the case of this question I would see “Flat file == file system–based” For example each blog entry and its accompanying metadata would be in a single file. Making for many files organized by date structure of the file folders (blogs\testblog2\2008\12\01) == 12/01/2008

跟进:在这个问题的情况下,我会看到“Flat file == file system-based”例如每个博客条目及其随附的元数据将在一个文件中。按文件夹的日期结构组织许多文件 (blogs\testblog2\2008\12\01) == 12/01/2008

回答by Will Hartung

Flat file databases have their place and are quite workable for the right domain.

平面文件数据库有其一席之地,并且非常适用于正确的域。

Mail servers and NNTP servers of the past really pushed the limits of how far you can really take these things (which is actually quite far -- files systems can have millions of files and directories).

过去的邮件服务器和 NNTP 服务器确实推动了您可以真正使用这些东西的极限(实际上相当远——文件系统可以拥有数百万个文件和目录)。

Flat file DBs two biggest weaknesses are indexing and atomic updates, but if the domain is suitable these may not be an issue.

平面文件数据库的两个最大弱点是索引和原子更新,但如果域合适,这些可能不是问题。

But you can, for example, with proper locking, do an "atomic" index update using basic file system commands, at least on Unix.

但是,例如,通过适当的锁定,您可以使用基本文件系统命令进行“原子”索引更新,至少在 Unix 上是这样。

A simple case is having the indexing process running through the data to create the new index file under a temporary name. Then, when you are done, you simply rename (either the system call rename(2) or the shell mv command) the old file over the new file. Rename and mv are atomic operations on a Unix system (i.e. it either works or it doesn't and there's never a missing "in between state").

一个简单的情况是让索引过程通过数据运行以在临时名称下创建新的索引文件。然后,当您完成后,您只需将旧文件重命名(系统调用 rename(2) 或 shell mv 命令)覆盖新文件。重命名和 mv 是 Unix 系统上的原子操作(即它要么工作要么不工作,而且永远不会缺少“状态之间”)。

Same with creating new entries. Basically write the file fully to a temp file, then rename or mv it in to its final place. Then you never have an "intermediate" file in the "DB". Otherwise, you might have a race condition (such as a process reading a file that is still being written, and may get to the end before the writing process is complete -- ugly race condition).

与创建新条目相同。基本上将文件完全写入临时文件,然后将其重命名或 mv 到其最终位置。那么你在“DB”中永远不会有“中间”文件。否则,您可能会遇到竞争条件(例如一个进程正在读取仍在写入的文件,并且可能在写入过程完成之前就结束——丑陋的竞争条件)。

If your primary indexing works well with directory names, then that works just fine. You can use a hashing scheme, for example, to create directories and subdirectories to locate new files.

如果您的主索引与目录名称配合良好,则效果很好。例如,您可以使用散列方案来创建目录和子目录以定位新文件。

Finding a file using the file name and directory structure is very fast as most filesystems today index their directories.

使用文件名和目录结构查找文件的速度非常快,因为当今大多数文件系统都会对它们的目录进行索引。

If you're putting a million files in a directory, there may well be tuning issues you'll want to look in to, but out of that box most will handle 10's of thousands easily. Just remember that if you need to SCAN the directory, there's going to be a lot of files to scan. Partitioning via directories helps prevent that.

如果您将一百万个文件放在一个目录中,则很可能存在您想要查看的调整问题,但大多数情况下可以轻松处理十万个文件。请记住,如果您需要扫描目录,将会有很多文件需要扫描。通过目录进行分区有助于防止这种情况。

But that all depends on your indexing and searching techniques.

但这一切都取决于您的索引和搜索技术。

Effectively, a stock off the shelf web server serving up static content is a large, flat file database, and the model works pretty good.

实际上,提供静态内容的现成 Web 服务器是一个大型平面文件数据库,并且该模型运行良好。

Finally, of course, you have the plethora of free Unix file system level tools at your disposal, but all them have issues with zillions of files (forking grep 1000000 times to find something in a file will have performance tradeoffs -- the overhead simply adds up).

最后,当然,您可以使用大量免费的 Unix 文件系统级工具,但所有这些工具都存在大量文件的问题(分叉 grep 1000000 次以在文件中查找某些内容会产生性能权衡——开销只会增加向上)。

If all of your files are on the same file system, then hard links also give you options (since they, too, are atomic) in terms of putting the same file in different places (basically for indexing).

如果您的所有文件都在同一个文件系统上,那么硬链接还为您提供了将相同文件放在不同位置(主要用于索引)方面的选项(因为它们也是原子的)。

For example, you could have a "today" directory, a "yesterday" directory, a "java" directory, and the actual message directory.

例如,您可以有一个“today”目录、一个“yesterday”目录、一个“java”目录和实际的消息目录。

So, a post could be linked in the "today" directory, the "java" directory (because the post is tagged with "java", say), and in its final place (say /articles/2008/12/01/my_java_post.txt). Then, at midnight, you run two processes. The first one takes all files in the "today" directory, checks their create date to make sure they're not "today" (since the process can take several seconds and a new file might sneak in), and renames those files to "yesterday". Next, you do the same thing for the "yesterday" directory, only here you simply delete them if they're out of date.

因此,帖子可以链接到“今天”目录、“java”目录(因为帖子被标记为“java”),并在其最终位置(例如 /articles/2008/12/01/my_java_post 。文本文件)。然后,在午夜,您运行两个进程。第一个获取“今天”目录中的所有文件,检查它们的创建日期以确保它们不是“今天”(因为该过程可能需要几秒钟并且可能会潜入一个新文件),然后将这些文件重命名为“昨天”。接下来,您对“昨天”目录执行相同的操作,只是在这里您只需删除它们,如果它们已过期。

Meanwhile, the file is still in the "java" and the ".../12/01" directory. Since you're using a Unix file system, and hard links, the "file" only exists once, these are all just pointers to the file. None of them are "the" file, they're all the same.

同时,该文件仍在“java”和“.../12/01”目录中。由于您使用的是 Unix 文件系统和硬链接,“文件”只存在一次,这些都只是指向文件的指针。它们都不是“那个”文件,它们都是一样的。

You can see that while each individual file move is atomic, the bulk is not. For example, while the "today" script is running, the "yesterday" directory can well contain files from both "yesterday" and "the day before" because the "yesterday" script had not yet run.

您可以看到,虽然每个单独的文件移动都是原子的,但批量不是。例如,当“today”脚本正在运行时,“yesterday”目录很可能包含来自“yesterday”和“the day before”的文件,因为“yesterday”脚本尚未运行。

In a transactional DB, you would do that all at once.

在事务性数据库中,您可以一次性完成所有这些操作。

But, simply, it is a tried and true method. Unix, in particular, works VERY well with that idiom, and the modern file systems can support it quite well as well.

但是,简单地说,这是一种久经考验的方法。尤其是 Unix,可以很好地使用这种习惯用法,而且现代文件系统也可以很好地支持它。

回答by Kyle Cronin

(answer copied and modified from here)

(答案从这里复制和修改)

I would advise against using a flat file for anything besides read-only access, because then you'd have to deal with concurrency issues like making sure only one process is writing to the file at once. Instead, I recommend SQLite, a fully functional SQL database that's stored in a file. SQLite already has built-in concurrency, so you don't have to worry about things like file locking, and it's really fast for reads.

我建议不要将平面文件用于只读访问之外的任何内容,因为这样您就必须处理并发问题,例如确保一次只有一个进程正在写入文件。相反,我推荐SQLite,这是一个存储在文件中的功能齐全的 SQL 数据库。SQLite 已经具有内置的并发性,因此您不必担心文件锁定之类的事情,并且读取速度非常快。

If, however, you are doing lots of database changes, it's best to do them all at once inside a transaction. This will only write the changes to the file once, as opposed to every time an change query is issued. This dramatically increases the speed of doing multiple changes.

但是,如果您要进行大量数据库更改,最好在一个事务中一次性完成所有更改。这只会将更改写入文件一次,而不是每次发出更改查询时。这极大地提高了进行多次更改的速度。

When a change query is issued, whether it's inside a tranasction or not, the whole database is locked until that query finishes. This means that extremely large transactions could adversely affect the performance of other processes because they must wait for the transaction to finish before they can access the database. In practice, I haven't found this to be that noticeable, but it's always good practice to try to minimize the number of database modifying queries you issue, and it's certainly faster then trying to use a flat file.

当发出更改查询时,无论它是否在事务中,整个数据库都会被锁定,直到该查询完成。这意味着极大的事务可能会对其他进程的性能产生不利影响,因为它们必须等待事务完成才能访问数据库。在实践中,我没有发现这有那么明显,但是尝试最小化您发出的数据库修改查询的数量总是很好的做法,而且它肯定比尝试使用平面文件要快。

回答by BenMaddox

This has been done with asp.net with Dasblog. It uses file based storage.

这已通过 asp.net 和 Dasblog 完成。它使用基于文件的存储。

A few details are listed on this older link. http://www.hanselman.com/blog/UpcomingDasBlog19.aspx

此旧链接上列出了一些详细信息。 http://www.hanselman.com/blog/UpcomingDasBlog19.aspx

You can also get more details on http://dasblog.info/Features.aspx

您还可以在http://dasblog.info/Features.aspx上获得更多详细信息

I've heard some mixed opinions on the performance. I'd suggest you research that a bit more to see if that type of system would work well for you. This is the closest thing I have heard about yet.

我听到了一些关于表演的不同意见。我建议你多研究一下,看看这种类型的系统是否适合你。这是我听说过的最接近的事情。

回答by Cade Roux

Writing your own engine in native code can outperform a general purpose database.

用本机代码编写自己的引擎可以胜过通用数据库。

However, the quality of the engine and the feature level will never approach that. All the things that databases give you as core features - indexing, transactions, referential integrity - you would have to implement all them yourself.

然而,引擎的质量和功能水平永远不会接近那个。数据库为您提供的所有核心功能 - 索引、事务、参照完整性 - 您必须自己实现所有这些功能。

There's nothing wrong than reinventing the wheel (after all, Linux was just that), but keep in mind your expectations and time commitment.

重新发明轮子没有错(毕竟,Linux 就是这样),但请记住您的期望和时间投入。

回答by Guemundur Bjarni

I'm answering this not to answer why flat file databases are good or bad, others have done an ample job at that.

我回答这个不是为了回答为什么平面文件数据库是好是坏,其他人在这方面做得很好。

However, some have been pointing at SQLite which does it's job just fine. Since you are using Java, your best option would be to use HSQLDB, which does precisely the same as SQLite, but is implemented in Java and embeds into your application.

然而,有些人指出 SQLite 做得很好。由于您使用的是 Java,因此最好的选择是使用HSQLDB,它的功能与 SQLite 完全相同,但使用 Java 实现并嵌入到您的应用程序中。

回答by stesch

Most of the time a flat file database is enough now. But you will thank your younger self if you start your project with a database. This could be SQLite, if you don't want to set up a whole database system like PostgreSQL.

大多数时候,一个平面文件数据库就足够。但是如果你用数据库开始你的项目,你会感谢年轻的自己。这可能是SQLite,如果您不想设置像PostgreSQL这样的整个数据库系统。

回答by Farooq Khan

Check this out http://jsondb.ioa opensource Java based database has most of what you are looking for. Saves data as flat .json files, Multithreading Support, Encryption Support, ORM support, Atomicity Support, XPATH based advanced query support.

请查看http://jsondb.io一个基于开源 Java 的数据库,其中包含您正在寻找的大部分内容。将数据保存为平面 .json 文件、多线程支持、加密支持、ORM 支持、原子性支持、基于 XPATH 的高级查询支持。

Disclaimer: I created this database.

免责声明:我创建了这个数据库。

回答by dkretz

They seem to work quite well for high-write, low-read, no-update databases, where new data is appended.

它们似乎适用于高写入、低读取、无更新的数据库,其中附加了新数据。

Web servers and their cousins rely on them heavily for log files.

Web 服务器及其同类服务器严重依赖它们来获取日志文件。

DBMS software as well use them for logs.

DBMS 软件也将它们用于日志。

If your design falls within these limits, you're in good company, it seems. You might want to keep metadata and pointers in a database, and set up some kind of fast asynchronous queue-writer to buffer the comments, but the filesystem is already pretty good at that level of buffering and write-locking.

如果您的设计落在这些限制范围内,那么您似乎是一家不错的公司。您可能希望将元数据和指针保存在数据库中,并设置某种快速异步队列编写器来缓冲注释,但文件系统在该级别的缓冲和写锁定方面已经非常出色。

回答by paxdiablo

Flat file databases are possible but consider the following.

平面文件数据库是可能的,但请考虑以下事项。

Databases need to attain all the ACID elements (atomicity, consistency, isolation, durability) and, if you're going to ensure that's all done in a flat file (especially with concurrent access), you've basically written a full-blown DBMS.

数据库需要获得所有 ACID 元素(原子性、一致性、隔离性、持久性),并且,如果您要确保所有这些都在一个平面文件中完成(尤其是并发访问),那么您基本上已经编写了一个完整的 DBMS .

So why not use a full-blown DBMS in the first place?

那么为什么不首先使用成熟的 DBMS 呢?

You'll save yourself the time and money involved with writing (and re-writing many times, I'll guarantee) if you just go with one of the free options (SQLite, MySQL, PostgresSQL, and so on).

如果您只使用其中一种免费选项(SQLite、MySQL、PostgresSQL 等),您将节省编写(并多次重写,我保证)所涉及的时间和金钱。

回答by Din

You can use fiat file databases if it is small enough does not have lost of random access. Big file with lot of random access will be very slow. And no complex queries. No joins, no sum, group by etc. You also can not expect to fetch hierarchical data from flat file. XML format is much better for complex structures.

如果足够小且不会丢失随机访问,您可以使用法定文件数据库。具有大量随机访问的大文件将非常慢。并且没有复杂的查询。无连接、无总和、分组依据等。您也不能指望从平面文件中获取分层数据。XML 格式对于复杂的结构要好得多。