Xml 或 Sqlite，何时为数据库删除 Xml？

Question

提问by sieben

I really like Xml for saving data, but when does sqlite/database become the better option? eg, when the xml has more than xitems or is greater than yMB?

我真的很喜欢用 Xml 来保存数据，但是什么时候 sqlite/database 成为更好的选择？例如，当 xml 有超过x项或大于yMB 时？

I am coding an rss reader and I believe I made the wrong choice in using xml over a sqlite database to store a cache of allthe feeds items. There are some feeds which have an xml file of ~1mb after a month, another has over 700 items, while most only have ~30 items and are ~50kb in size after a severalmonths.

我正在编写一个 rss 阅读器，我相信我在使用 xml 而不是 sqlite 数据库来存储所有提要项目的缓存时做出了错误的选择。有其中一个月后有〜1MB的XML文件的一些饲料，另外有超过700个项目，而大部分只有约30项，并在〜50KB大小一后数个月。

I currently have no plans to implement a cap because I like to be able to search through everything.

我目前没有计划实施上限，因为我喜欢能够搜索所有内容。

So, my questions are:

所以，我的问题是：

When is the overhead of sqlite/databases justified over using xml?
Are the few large xml filesjustification enough for the database when there are a lot of smallones, though even the small ones will grow over time? (a long longtime)

sqlite/数据库的开销何时比使用 xml 更合理？
当有很多小的xml 文件时，几个大的 xml 文件是否足以为数据库提供足够的理由，尽管即使是小的也会随着时间的推移而增长？（很久很久）

updated(more info)

更新（更多信息）

Every time a feed is selected in the GUI I reload all the items from that feeds xml file.

每次在 GUI 中选择提要时，我都会重新加载该提要 xml 文件中的所有项目。

I also need to modify the read/unread status which seems really hacky when I loop through all nodes in the xml to find the item and then set it to read/unread.

我还需要修改已读/未读状态，当我遍历 xml 中的所有节点以查找该项目然后将其设置为已读/未读时，这似乎非常棘手。

Answer 1

采纳答案by Stan

I basically agree with Mitchel, that this can be highly specific depending on what are you gonna do with XML/sqlite. For your case (cache), it seems to me that using sqlite (or other embedded dbs) makes more sense.

我基本上同意Mitchel 的观点，这可能是非常具体的，具体取决于您要使用 XML/sqlite 做什么。对于您的情况（缓存），在我看来，使用 sqlite（或其他嵌入式数据库）更有意义。

First I don't really think that sqlite will need more overhead than XML. And I mean both development time overhead and runtime overhead. Only problem is that you have a dependance on sqlite library. But since you would need some library for XML anyway it doesn't matter (I assume project is in C/C++).

首先，我并不认为 sqlite 会比 XML 需要更多的开销。我的意思是开发时间开销和运行时开销。唯一的问题是您依赖于 sqlite 库。但是因为无论如何你都需要一些 XML 库，所以没关系（我假设项目是在 C/C++ 中的）。

Advantages of sqlite over xml:

sqlite 相对于 xml 的优点：

everything in one file,
performance loss is lower than XML as cache gets bigger,
you can keep feed metadata separate from cache itself (other table), but accessible in the same way,
SQL is probably easier to work with than XPath for most people.

一个文件中的所有内容，
随着缓存变大，性能损失低于 XML，
您可以将提要元数据与缓存本身（其他表）分开，但可以以相同的方式访问，
对于大多数人来说，SQL 可能比 XPath 更容易使用。

Disadvantages of sqlite:

sqlite的缺点：

can be problematic with multiple processes accessing same database (probably not your case),
you should know at least basic SQL. Unless there will be hundreds of thousands of items in cache, I don't think you will need to optimize it much,
maybe in some way it can be more dangerous from security standpoint (SQL injection). On the other hand, you are not coding web app, so this should not happen.

多个进程访问同一个数据库可能会出现问题（可能不是你的情况），
你至少应该知道基本的 SQL。除非缓存中有数十万个项目，否则我认为您不需要对其进行太多优化，
从安全的角度来看，也许在某种程度上它可能更危险（SQL 注入）。另一方面，您不是在编写 Web 应用程序，因此不应发生这种情况。

Other things are on par for both solutions probably.

其他事情可能与这两种解决方案相提并论。

To sum it up, answers to your questions respectively:

总结一下，分别回答你的问题：

You will not know, unless you test your specific application with both backends. Otherwise it's always just a guess. Basic support for both caches should not be a problem to code. Then benchmark and compare.
Because of the way XML files are organized, sqlite searches should always be faster (barring some corner cases where it doesn't matter anyway because it's blazingly fast). Speeding up searches in XML would require index database anyway, in your case that would mean having cache for cache, not a particularly good idea. But with sqlite you can have indexing as part of database.

您不会知道，除非您使用两个后端测试您的特定应用程序。否则它总是只是一个猜测。对两种缓存的基本支持对代码来说应该不是问题。然后进行基准和比较。
由于 XML 文件的组织方式，sqlite 搜索应该总是更快（除非在某些极端情况下它无关紧要，因为它非常快）。无论如何，加速 XML 中的搜索都需要索引数据库，在您的情况下，这意味着缓存缓存，这不是一个特别好的主意。但是使用sqlite，您可以将索引作为数据库的一部分。

Answer 2

回答by sieben

Man do I have experience with this. I work on a project where we originally stored all of our data using XML, then moved to sqlite. There are many pros and cons to each technology, but it was performance that caused the switchover. Here is what we observed.

伙计，我有这方面的经验。我在一个项目中工作，我们最初使用 XML 存储所有数据，然后转移到 sqlite。每种技术都有许多优点和缺点，但性能导致了转换。这是我们观察到的。

For small databases (a few meg or smaller), XML was much faster, and easier to deal with. Our data was naturally in a tree format, which made XML much more attractive, and XPATH allowed us to do many queries in one simple line rather than having to walk down an ancestry tree.

对于小型数据库（几兆或更小），XML 更快，更容易处理。我们的数据自然是树格式，这使得 XML 更具吸引力，XPATH 允许我们在一条简单的行中执行许多查询，而不必沿着祖先树走下去。

We were programming in a Win32 environment, and used the standard Microsoft DOM library. We would load all the data into memory, parse it into a dom tree and search, add, modify on the in memory copy. We would periodically save the data, and needed to rotate copies in case the machine crashed in the middle of a write.

我们在 Win32 环境中进行编程，并使用标准的 Microsoft DOM 库。我们会将所有数据加载到内存中，将其解析为 dom 树，然后在内存副本上进行搜索、添加、修改。我们会定期保存数据，并且需要轮换副本，以防机器在写入过程中崩溃。

We also needed to build up some "indexes" by hand using C++ tree maps. This, of course would be trivial to do with sql.

我们还需要使用 C++ 树图手动构建一些“索引”。这当然与 sql 无关。

Note that the size of the data on the filesystem was a factor of 2-4 smaller than the "in memory" dom tree.

请注意，文件系统上的数据大小比“内存中”dom 树小 2-4 倍。

By the time the data got to 10M-100M size, we started to have real problems. Interestingly enough, at all data sizes, XML processing was much faster than sqlite turned out to be (because it was in memory, not on the hard drive)! The problem was actually twofold- first, loadup time really started to get long. We would need to wait a minute or so before the data was in memory and the maps were built. Of course once loaded the program was very fast. The second problem was that all of this memory was tied up all the time. Systems with only a few hundred meg would be unresponsive in other apps even though we ran very fast.

当数据达到 10M-100M 大小时，我们开始遇到真正的问题。有趣的是，在所有数据大小下，XML 处理都比 sqlite 快得多（因为它在内存中，而不是在硬盘上）！问题实际上是双重的——首先，加载时间真的开始变长了。在数据进入内存并构建地图之前，我们需要等待一分钟左右。当然一旦加载程序是非常快的。第二个问题是所有这些内存一直被占用。即使我们运行得非常快，只有几百兆的系统在其他应用程序中也没有响应。

We actually looking into using a filesystem based xml database. There are a couple open sourced versions xml databases, we tried them. I have never tried to use a commercial xml database, so I can't comment on them. Unfortunately, we could never get the xml databases to work well at all. Even the act of populating the database with hundreds of meg of xml took hours.... Perhaps we were using it incorrectly. Another problem was that these databases were pretty heavyweight. They required java and had full client server architecture. We gave up on this idea.

我们实际上正在研究使用基于文件系统的 xml 数据库。有几个开源版本的 xml 数据库，我们尝试了它们。我从未尝试过使用商业 xml 数据库，因此我无法对其发表评论。不幸的是，我们根本无法让 xml 数据库正常工作。甚至用数百兆的 xml 填充数据库的行为也需要几个小时......也许我们使用它不正确。另一个问题是这些数据库非常重量级。他们需要 Java 并拥有完整的客户端服务器架构。我们放弃了这个想法。

We found sqlite then. It solved our problems, but at a price. When we initially plugged sqlite in, the memory and load time problems were gone. Unfortunately, since all processing was now done on the harddrive, the background processing load went way up. While earlier we never even noticed the CPU load, now the processor usage was way up. We needed to optimize the code, and still needed to keep some data in memory. We also needed to rewrite many simple XPATH queries as complicated multiquery algorithms.

然后我们找到了sqlite。它解决了我们的问题，但要付出代价。当我们最初插入 sqlite 时，内存和加载时间问题就消失了。不幸的是，由于所有处理现在都在硬盘上完成，后台处理负载增加了。虽然早些时候我们甚至从未注意到 CPU 负载，但现在处理器使用率上升了。我们需要优化代码，仍然需要在内存中保留一些数据。我们还需要将许多简单的 XPATH 查询重写为复杂的多查询算法。

So here is a summary of what we learned.

所以这里是我们学到的东西的总结。

For tree data, XML is much easier to query and modify using XPATH.
For small datasets (less than 10M), XML blew away sqlite in performance.
For large datasets (greater than 10M-100M), XML load time and memory usage became a big problem, to the point that some computers become unusable.
We couldn't get any opensource xml database to fix the problems associated with large datasets.
SQLITE doesn't have the memory problems of XML dom, but it is generally slower in processing the data (it is on the hard drive, not in memory). (note- sqlite tables can be stored in memory, perhaps this would make it as fast.... We didn't try this because we wanted to get the data out of memory.)
Storing and querying tree data in a table is not enjoyable. However, managing transactions and indexing partially makes up for it.

对于树数据，使用 XPATH 更容易查询和修改 XML。
对于小型数据集（小于 10M），XML 在性能上击败了 sqlite。
对于大型数据集（大于 10M-100M），XML 加载时间和内存使用成为一个大问题，以至于某些计算机变得无法使用。
我们无法获得任何开源 xml 数据库来修复与大型数据集相关的问题。
SQLITE 没有 XML dom 的内存问题，但处理数据通常较慢（它在硬盘驱动器上，而不是在内存中）。（注意 - sqlite 表可以存储在内存中，也许这会使其速度更快......我们没有尝试这个，因为我们想从内存中获取数据。）
在表中存储和查询树数据并不愉快。然而，管理事务和索引部分弥补了它。

Answer 3

回答by Oli

Don't forget that you have a great database at your fingertips: the filesystem!

不要忘记您拥有一个触手可及的强大数据库：文件系统！

Lots of programmers forget that a decent directory-file structure is/has:

许多程序员忘记了一个体面的目录文件结构是/具有：

It's fast as hell
It's portable
It has a tiny runtime footprint

快得要死
它是便携式的
它的运行时占用空间很小

People are talking about splitting up XML files into multiple XML files... I would consider splitting your XML into multiple directories and multiple plaintext files.

人们正在谈论将 XML 文件拆分为多个 XML 文件...我会考虑将您的 XML 拆分为多个目录和多个纯文本文件。

Give it a go. It's refreshingly fast.

搏一搏。它令人耳目一新。

Answer 4

回答by Vin

Use XML for data that the application should know - configuration, logging and what not.
Use databases(oracle, SQL server etc) for data that the user interacts with directly or indirectly - real data
Use SQLite if the user data is more of a serialized collection - like huge list of files and their content or collection of email items etc. SQLite is good at that.

将 XML 用于应用程序应该知道的数据 - 配置、日志记录等等。
将数据库（oracle、SQL server 等）用于用户直接或间接与之交互的数据 - 真实数据
如果用户数据更像是一个序列化的集合，请使用 SQLite - 比如巨大的文件列表及其内容或电子邮件项目的集合等。 SQLite 擅长这一点。

Depends on the kind and the size of the data.

取决于数据的种类和大小。

Answer 5

回答by typicalrunt

I wouldn't use XML for storing RSS items. A feed reader makes constant updates as it receives data.

我不会使用 XML 来存储 RSS 项目。提要阅读器在接收数据时会不断更新。

With XML, you need to load the data from file first, parse it, then store it for easy search/retrieval/update. Sounds like a database...

使用 XML，您需要先从文件加载数据，解析它，然后存储它以便于搜索/检索/更新。听起来像数据库...

Also, what happens if your application crashes? if you use XML, what state is the data in the XML file versus the data in memory. At least with SQLite you get atomicity, so you are assured that your application will start with the same state as when the last database write was made.

此外，如果您的应用程序崩溃会发生什么？如果您使用 XML，那么 XML 文件中的数据与内存中的数据处于何种状态。至少在 SQLite 中，您可以获得原子性，因此您可以放心，您的应用程序将以与上次数据库写入时相同的状态启动。

Answer 6

回答by Bradley Harris

XML is best used as an interchange format when you need to move data from your application to somewhere else or share information between applications. A database should be the preferred method of storage for almost any size application.

当您需要将数据从应用程序移动到其他地方或在应用程序之间共享信息时，XML 最适合用作交换格式。数据库应该是几乎任何规模应用程序的首选存储方法。

Answer 7

回答by David Medinets

When should XML be used for data persistence instead of a database? Almost never. XML is a data transport language. It is slow to parse and awkward to query. Parse the XML (don't shred it!) and convert the resulting data into domain objects. Then persist the domain objects. A major advantage of a database for persistence is SQL which means unstructured queries and access to common tools and optimization techniques.

什么时候应该使用 XML 而不是数据库来进行数据持久化？几乎从不。XML 是一种数据传输语言。解析慢，查询难。解析 XML（不要粉碎它！）并将结果数据转换为域对象。然后持久化域对象。持久性数据库的一个主要优势是 SQL，这意味着非结构化查询以及对常用工具和优化技术的访问。

Answer 8

回答by sieben

I have made the switch to SQLite and I feel muchbetter knowing it's in a database.

我已经切换到 SQLite，知道它在数据库中让我感觉好多了。

There are a lot of other benefits from this:

这样做还有很多其他好处：

Adding new items is really simple
Sorting by multiple columns
Removing duplicates with a unique index

添加新项目真的很简单
按多列排序
使用唯一索引删除重复项

I've created 2 views, one for unread items and one for all items, not sure if this is the best use of views, but I really wanted to try using them.

我创建了 2 个视图，一个用于未读项目，一个用于所有项目，不确定这是否是视图的最佳用途，但我真的很想尝试使用它们。

I also benchmarked the xml vs sqlite using the StopWatchclass, and the sqlite is faster, although it could just be that my way of parsing xml files wasn't the fastest method.

我还使用StopWatch类对 xml 与 sqlite 进行了基准测试，sqlite 速度更快，尽管可能只是我解析 xml 文件的方式不是最快的方法。

Small # items and size (25 items, 30kb)
- ~1.5 ms sqlite
- ~8.0 ms xml
Large # of items (700 items, 350kb)
- ~20 ms sqlite
- ~25 ms xml
Large file size (850 items, 1024kb)
- ~45 ms sqlite
- ~60 ms xml

小 # 项和大小（25 项，30kb）
- 约 1.5 毫秒的 SQLite
- ~8.0 毫秒 xml
大量项目（700 项，350kb）
- 约 20 毫秒的 SQLite
- ~25 毫秒 xml
大文件大小（850 项，1024kb）
- 约 45 毫秒的 SQLite
- ~60 毫秒 xml

Answer 9

回答by Mitchel Sellers

To me it really depends on what you are doing with them, how many users/processes need access to them at the same time etc.

对我来说，这真的取决于你用它们做什么，有多少用户/进程需要同时访问它们等等。

I work with large XML files all the time, but they are single process, import style items, that multi-user, or performance are not really needs.

我一直在处理大型 XML 文件，但它们是单进程、导入样式项、多用户或性能并不是真正需要的。

SO really it is a balance.

所以真的是一种平衡。

Answer 10

回答by Mostlyharmless

If any time you will need to scale, use databases.

如果任何时候需要扩展，请使用数据库。

Xml 或 Sqlite，何时为数据库删除 Xml？

提问by sieben

采纳答案by Stan

回答by sieben

回答by Oli

回答by Vin

回答by typicalrunt

回答by Bradley Harris

回答by David Medinets

回答by sieben

回答by Mitchel Sellers

回答by Mostlyharmless

相关推荐

最近更新

标签

Xml 或 Sqlite，何时为数据库删除 Xml？

提问by sieben

采纳答案by Stan

回答by sieben

回答by Oli

回答by Vin

回答by typicalrunt

回答by Bradley Harris

回答by David Medinets

回答by sieben

回答by Mitchel Sellers

回答by Mostlyharmless

相关推荐

如何将 xml 文件加载到 Hive 中

XML 属性与 XML 元素

xml 命名空间“xSchema”中的元素“x”在命名空间“xSchema”中具有无效的子元素“y”。预期的可能元素列表：“y”

Excel VBA 从 XML 获取特定节点

相关推荐

最近更新

标签