“丢失数据”的批评在多大程度上对 MongoDB 仍然有效?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10560834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:39:22  来源:igfitidea点击:

To what extent are 'lost data' criticisms still valid of MongoDB?

mongodb

提问by deltanovember

To what extent are 'lost data' criticisms still valid of MongoDB? I'm referring to the following:

“丢失数据”的批评在多大程度上对 MongoDB 仍然有效?我指的是以下内容

1. MongoDB issues writes in unsafe ways by defaultin order to win benchmarks

If you don't issue getLastError(), MongoDB doesn't wait for any confirmation from the database that the command was processed. This introduces at least two classes of problems:

  • In a concurrent environment (connection pools, etc), you may have a subsequent read fail after a write has "finished"; there is no barrier condition to know at what point the database will recognize a write commitment
  • Any unknown number of save operations can be dropped on the floor due to queueing in various places, things outstanding in the TCP buffer, etc, when your connection drops of the db were to be KILL'd or segfault, hardware crash, you name it

2. MongoDB can lose data in many startling ways

Here is a list of ways we personally experienced records go missing:

  1. They just disappeared sometimes. Cause unknown.
  2. Recovery on corrupt database was not successful, pre transaction log.
  3. Replication between master and slave had gapsin the oplogs, causing slaves to be missing records the master had. Yes, there is no checksum, and yes, the replication status had the slaves current
  4. Replication just stops sometimes, without error. Monitor your replication status!

...[other criticisms]

1. MongoDB默认以不安全的方式写入以赢得基准测试

如果您不发出 getLastError(),则 MongoDB 不会等待来自数据库对命令已处理的任何确认。这至少引入了两类问题:

  • 在并发环境(连接池等)中,写入“完成”后可能会出现后续读取失败;没有障碍条件知道数据库将在什么时候识别写入承诺
  • 由于在各个地方排队、TCP 缓冲区中的未完成事项等,任何未知数量的保存操作都可能被丢弃在地板上,当您的数据库连接丢失或段错误、硬件崩溃时,您可以说出它

2. MongoDB 可能以多种令人吃惊的方式丢失数据

以下是我们个人经历过的记录丢失的方式列表:

  1. 他们只是有时消失了。原因不明。
  2. 恢复损坏的数据库不成功,预事务日志。
  3. master 和 slave 之间的复制在oplog 中存在间隙,导致 slave 丢失 master 的记录。是的,没有校验和,是的,复制状态有从属当前
  4. 复制有时会停止,没有错误。监控您的复制状态!

...[其他批评]

If still valid, these criticisms would be worrying to some extent. The article primarily references v1.6 and v1.8, but since then v2 has been released. Are the shortcomings discussed in the article still outstanding as of the current release?

如果仍然有效,这些批评将在某种程度上令人担忧。文章主要参考了 v1.6 和 v1.8,但此后 v2 已经发布。文章中讨论的缺点在当前版本中是否仍然突出?

采纳答案by Adam Comerford

Note on Context:

关于上下文的注意事项:

This question was asked in 2012, but still sees traffic and votes to this day. The original answer was specifically to refute a particular post that was popular at the time of the question. Things have changed (and continue to change) massively since this answer was written. MongoDB has certainly become far more durable and reliable than it was in 2012 when even things like basic journaling were relatively new. I get downvotes and comments on this answer because people feel I don't address the current (for a given value of current) general answer to the titular question (not the detail): "are lost data criticisms still valid?". I have attempted to clarify in updates below, but there is basically no perfect answer to this question, it depends on your perspective, what your expectations are/were, what version you are using, what configuration, whether you feel upset about the default settings etc.

这个问题是在 2012 年提出的,但直到今天仍然可以看到流量和投票。最初的答案是专门反驳一个在问题发生时很受欢迎的特定帖子。自从写下这个答案以来,事情已经发生了巨大的变化(并继续发生着变化)。与 2012 年相比,MongoDB 确实变得更加耐用和可靠,当时甚至基本日记等功能都相对较新。我对这个答案投了反对票和评论,因为人们觉得我没有解决当前(对于给定的当前值)对标题问题(不是细节)的一般答案:“丢失的数据批评仍然有效吗?”。我试图在下面的更新中澄清,但这个问题基本上没有完美的答案,这取决于你的观点,你的期望是什么,你使用的是什么版本,

Original Answer:

原答案:

That particular post was debunked, point by point by the MongoDB CTO and co-founder, Eliot Horowitz, here:

MongoDB 首席技术官兼联合创始人 Eliot Horowitz 在此处逐条揭穿了该帖子:

http://news.ycombinator.com/item?id=3202959

http://news.ycombinator.com/item?id=3202959

There is also a good summary here:

这里也有一个很好的总结:

http://www.betabeat.com/2011/11/10/the-trolls-come-out-for-10gen/

http://www.betabeat.com/2011/11/10/the-trolls-come-out-for-10gen/

The short version is, it looks like this was basically someone trolling for attention (successfully), with no solid evidence or corroboration. There have been genuine incidents in the past, which have been dealt with as the product evolved (see the introduction of journaling in 1.8 for example) or as more specific bugs were found and fixed.

简短的版本是,看起来这基本上是有人在吸引注意力(成功),没有确凿的证据或佐证。过去曾发生过真正的事件,随着产品的发展(例如,参见 1.8 中日记的介绍)或在发现并修复了更具体的错误时,这些事件已得到处理。

Disclaimer:I do work for MongoDB (formerly 10gen), and love the fact that philnate got here and refuted this independently first - that probably says more about the product than anything else :)

免责声明:我确实为 MongoDB(以前的 10gen)工作,并且喜欢 philnate 来到这里并首先独立反驳这一事实-这可能比其他任何事情都更能说明产品:)

Update: August 19th 2013

更新:2013 年 8 月 19 日

I've seen quite a bit of activity on this answer recently, which I assume is related to the announcement of the bug in SERVER-10478- it is most certainly an edge case, but I would still recommend anyone using sharding with large documents to upgrade ASAP to v2.2.6 and v2.4.6 which include the fix for this issue.

我最近在这个答案上看到了很多活动,我认为这与SERVER-10478中的错误公告有关- 这肯定是一个边缘情况,但我仍然建议任何人使用大文档进行分片尽快升级到 v2.2.6 和 v2.4.6,其中包括针对此问题的修复。

Update: March 24th 2017

更新:2017 年 3 月 24 日

I no longer work for MongoDB, but stand behind this answer nonetheless. Given that this answer continues to get up (and down) votes and receives a lot of views I would like to point people at this postwhich shows the progress MongoDB has made since this question was posed. The database now passes the Jepsentests, and has integrated the testsinto its build process, there are plenty of far more mature systems that do not pass. Anyone still beating the data loss drum in 2017 really hasn't been paying attention.

我不再为 MongoDB 工作,但仍然支持这个答案。鉴于此答案继续获得支持(和反对)票并收到很多观点,我想指出这篇文章的人,该文章显示了自提出此问题以来 MongoDB 取得的进展。数据库现在通过了Jepsen测试,并将测试集成到其构建过程中,还有很多更成熟的系统没有通过。任何人在 2017 年仍然击败数据丢失鼓的人真的没有注意到。

Update: May 24th 2020

更新:2020 年 5 月 24 日

Jepsen has re-analyzed MongoDB 4.2.6given that MongoDB now offers "full ACID transactions" and while it gets quite technical in parts, I highly recommend reading the article if data loss in MongoDB is a concern for you (I would recommend checking out any database you use that Jepsen tests, you might be surprised at their weak spots). The report summarizes the weaknesses in the default read and write concerns, talks about how reliable non-transaction reads and writes are with appropriate read and write concerns, addresses flaws in the documentation, and then provides significant details about the issues encountered when testing the new ACID transactions (and associated read/write concerns).

鉴于 MongoDB 现在提供“完整的 ACID 事务”,Jepsen重新分析了 MongoDB 4.2.6,虽然它在某些方面变得非常技术性,但如果您担心 MongoDB 中的数据丢失,我强烈建议您阅读这篇文章(我建议您查看您使用 Jepsen 测试的任何数据库,您可能会对它们的弱点感到惊讶)。该报告总结了默认读写问题的弱点,讨论了具有适当读写问题的非事务性读写的可靠性,解决了文档中的缺陷,然后提供了有关测试新版本时遇到的问题的重要细节ACID 事务(以及相关的读/写问题)。

So, can you still lose data with MongoDB?Yes, especially with default settings, but that is true of most databases. Things are vastly better than they were back when this question was answered, and the capabilities are there for more reliability and durability, and they seem to work (transactions aside). My advice is to learn what the limitations of the configuration are that you operate and to then determine whether the data loss risk is acceptable or not for your product/business/use case.

那么,使用 MongoDB 还会丢失数据吗?是的,尤其是默认设置,但大多数数据库都是如此。事情比回答这个问题时要好得多,而且这些功能具有更高的可靠性和耐用性,而且它们似乎有效(交易除外)。我的建议是了解您操作的配置有哪些限制,然后确定您的产品/业务/用例的数据丢失风险是否可以接受。

回答by mikemaccana

I can't speak for every case, only my own. However unlike the other answer I don't work for Mongo or its competitors, I have lost data when using MongoDB, and used Mongo for around ten years, so here goes.

我不能说每一种情况,只能说我自己的情况。然而,与我不为 Mongo 或其竞争对手工作的另一个答案不同,我在使用 MongoDB 时丢失了数据,并且使用了 Mongo 大约十年,所以这里是。

2010

2010年

This is when I first began using Mongo, the main criticisms of Mongo around the time were:

这是我第一次开始使用 Mongo 时,当时对 Mongo 的主要批评是:

  • Supposedly stable versions of Mongo had major data-losing bugs that weren't made explicit to users. Eg, prior to 1.8 non-clustered configurations were likely to lose data. This was documented by Mongo, but not to the extent a data losing bug in a stable-versioned DB would normally be.
  • 据称,Mongo 的稳定版本存在重大的数据丢失错误,这些错误并未向用户明确说明。例如,在 1.8 之前的非集群配置很可能会丢失数据。Mongo 记录了这一点,但没有达到稳定版本数据库中数据丢失错误的程度。

The main defence of that criticism was:

该批评的主要辩护是:

  • Users were informed of this danger, albeit not so explicitly. Users should read all the documentation before they begin.
  • 用户被告知这种危险,尽管没有那么明确。用户应该在开始之前阅读所有文档。

In my own case, I was using 1.7 in a single server configuration but aware of the risk. I shut down the DB to take a back up. The act of shutting down the DB itself lost my data, 10gen assisted (for free) but were unable to recover the data.

在我自己的情况下,我在单个服务器配置中使用 1.7,但意识到风险。我关闭了数据库进行备份。关闭数据库本身的行为丢失了我的数据,10gen 协助(免费)但无法恢复数据。

2013

2013年

Later, in 2013, a study came out revealing MongoDB defaults can cause significant loss of acknowledged writesduring network partitions.

后来,在 2013 年,一项研究表明MongoDB 默认值会导致网络分区期间已确认写入的大量丢失

Also in 2013 Mongo the official production node Mongo drivers wrapped and threw away all errors.

同样在 2013 年 Mongo,官方生产节点 Mongo 驱动程序包装并丢弃了所有错误

2014

2014年

Since then, in 2014 a completely different bug in the stable MongoDB driverbit me and many other users.

从那以后,在 2014 年,稳定的 MongoDB 驱动程序中一个完全不同的错误让我和许多其他用户陷入困境

2016

2016年

In 2016, the Meteor project has issues with MongoDB queries not always returning all matching documents.

2016年,流星项目与MongoDB的查询并不总是返回所有匹配的文件的问题

Later MongoDB's policy of listening on all interfaces by default with no admin password sethas also worked out badly for many users. We knew two decades ago (and probably earlier, but I wasn't in tech at the time) that listening on all ports by default was a bad idea, which is why other software avoids this.

后来 MongoDB在默认情况下监听所有接口而不设置管理员密码的策略对许多用户来说也很糟糕。二十年前(可能更早,但我当时不在技术领域)我们知道默认情况下侦听所有端口是一个坏主意,这就是其他软件避免这种情况的原因。

2020

2020年

Jepsen evaluated MongoDB 4.2.6and concluded:

Jepsen 评估了 MongoDB 4.2.6并得出结论:

even at the strongest levels of read and write concern, MongoDB 4.2.6 failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.

即使在读写关注度最高的情况下,MongoDB 4.2.6 也未能保留快照隔离。相反,Jepsen 观察到读取倾斜、循环信息流、重复写入和内部一致性违规。弱默认意味着事务可能会丢失写入并允许脏读,甚至在数据库和集合级别降低请求的安全级别。

Conclusion

结论

There have been general, repeated observations, over many years, that Mongo has unsafe defaults to win performance benchmarks. Mongo generally responds that the user should be aware of these by reading all the relevant docs and may use choose to use safe options if they are needed.

多年来,人们普遍反复观察到,Mongo 有不安全的默认设置来赢得性能基准。Mongo 通常回应说,用户应该通过阅读所有相关文档来了解这些,并且可以在需要时选择使用安全选项。

As of 2020I feel like MongoDB now is actually a more stable product simply through time and investment, however I will never trust the company for using our data to beta test for a decade, and I would not be surprised at all if another data loss condition was revealed. I have used Postgres JSONB, FoundationDB and RethinkDB as structured data stores which may be valid alternatives.

到 2020 年我觉得 MongoDB 现在实际上是一个更稳定的产品,只是通过时间和投资,但我永远不会相信这家公司使用我们的数据进行 Beta 测试十年,如果再次丢失数据,我一点也不感到惊讶情况被揭露。我已经使用 Postgres JSONB、FoundationDB 和 RethinkDB 作为结构化数据存储,它们可能是有效的替代方案。

enter image description here

在此处输入图片说明

回答by Vanuan

As of February 2017, the most recent Jepsen analysis of MongoDBsuggests that data loss was possible in all versions of MongoDB up to MongoDB 3.2.11 and 3.4.0-rc4.

截至 2017 年 2 月,最新的 Jepsen 对 MongoDB 的分析表明,在 MongoDB 3.2.11 和 3.4.0-rc4 之前的所有 MongoDB 版本中都可能发生数据丢失

So at the time the question was written (2012) the answer should've been yes, those criticisms were valid from theoretical perspective. But it looks like customers don't care about the theory. As RethinkDB failhas shown, correctness doesn't matter. The only thing that matters is time to market. Very sad.

因此,在撰写问题时(2012 年),答案应该是肯定的,这些批评从理论角度来看是有效的。但看起来客户并不关心这个理论。正如RethinkDB 失败所示,正确性并不重要。唯一重要的是上市时间。很伤心。

As of Oct 2018, On MongoDB 3.4 - This is still an issue.

截至 2018 年 10 月,在 MongoDB 3.4 上 - 这仍然是一个问题。

回答by philnate

Never heard of those severe problems in recent versions. What you need to consider is that MongoDB has no decade of development as relational Systems in the back. Further it may be true that MongoDB doesn't offer that much functionality to avoid data loss at all. But even with relational Systems you won't be ever sure that you'll never loose any data. It highly depends on your system configuration (so with Replication and manual data backups you should be quite safe).

在最近的版本中从未听说过那些严重的问题。您需要考虑的是,MongoDB 作为关系系统没有十年的发展历史。此外,MongoDB 可能根本没有提供那么多功能来避免数据丢失。但即使使用关系系统,您也无法确定永远不会丢失任何数据。这在很大程度上取决于您的系统配置(因此使用复制和手动数据备份您应该非常安全)。

As a general guideline to avoid Beta Bugs or bugs from early versions, avoid to use fresh versions in productions (there's a reason why debian is so popular for servers). If MongoDB would suffer such severe problems (all the time) the list of users would be smaller: https://www.mongodb.com/community/deploymentsAdditionally I don't really trust this pastebin message, why is this published anonymously? Is this person company shamed to tell that they used mongodb, do they fear 10gen? Where a links to those Bug reports (or did 10gen delete them from JIRA?)

作为避免 Beta 错误或早期版本错误的一般准则,请避免在生产中使用新版本(debian 在服务器上如此受欢迎是有原因的)。如果 MongoDB 会遇到如此严重的问题(一直),那么用户列表会更小:https: //www.mongodb.com/community/deployments另外我不太相信这个 pastebin 消息,为什么这是匿名发布的?这个人公司羞于说他们用的是mongodb,他们害怕10gen吗?这些错误报告的链接在哪里(或者 10gen 是否从 JIRA 中删除了它们?)

So lets talk shortly about those points:

因此,让我们简短地谈谈这些要点:

  1. Yep MongoDB operates normally in fire and forget mode. But you can modify this bevavior with several options: https://docs.mongodb.com/manual/reference/command/getLastError/#dbcmd.getLastError. So only because MongoDB defaults to it, it doesn't mean you can't change it to your needs. But you need to live less performance if you don't fire and forget within your app, as you're adding a roundtrip.

    Update: Since version 2.6, the commands insert, update, save, removeby default acknowledges the write.

  2. Never heard of such problems, except those caused to own failure...but that can happen with relational systems as well. I guess this point only talks about Master-Slave Replication. Replica-Sets are much never and stable. Some links from the web where other dbms caused data loss due to malfunction as well: http://blog.lastinfirstout.net/2010/04/bit-by-bug-data-loss-running-oracle-on.htmlhttp://dbaspot.com/oracle-server/430465-parallel-cause-data-lost-another-oracle-bug.htmlhttp://bugs.mysql.com/bug.php?id=18014(Those posted links aren't in any favor of a given system or should imply anything else than showing that there are bugs in other systems as well, who can cause data loss.)

  3. Yes actually there's Locking at instance level, I don't think that in sharded environment this is a global one, I think this will be at instance level for each shard separate, as there's no need to lock other instances as there are no consistency checks needed. The upcoming Version 2.2 will lock at DB Level, tickets for Collection Level and maybe extend or document exists as well (https://jira.mongodb.org/browse/SERVER-4328). But locking at deeper levels may affect the actual performance of MongoDB, as a lock management is expensive.

  4. Moving chunks shouldn't cause much problems as rebalancing should take a few chunks from each node and move them to the new one. It never should cause ping/pong of chunks nor does rebalancing start just because of a difference of one or two chunks. What can be problematic is when your shard key is choosen wrong. So you may end up with all new entries inserted to one node rather than all. So you would see more often rebalancing which can cause problems, but that would be not due to mongo rather than your poorly choosen shardkey.

  5. Can't comment on this one

  6. Not 100% sure, but I think Replicasets where introduced in 1.6, so as told earlier never use the latest version for production, except you can live with loss of data. As with every new feature there's the possibility of bugs. Even extensive testing may not reveal all problems. Again always run some manual backup for gods sake, except you can live with data loss.

  7. Can't comment on this. But in reality software may contain severe bugs. Many games suffer those problems as well and there are other areas as well where banana software was quite well known or is. Can't Comment about something concrete as this was before my MongoDB time.

  8. Replication can cause such problems. Depending on the replication strategy this may be a problem and another system may fit better. But without a really really write intensive workload you may not encounter such problems. Indeed it may be problematic to have 3 replicas polling changes from one master. You could cure the problem by adding more shards.

  1. 是的,MongoDB 在即发即弃模式下正常运行。但是您可以使用多个选项修改此行为:https://docs.mongodb.com/manual/reference/command/getLastError/#dbcmd.getLastError 。所以仅仅因为 MongoDB 默认使用它,并不意味着您不能根据需要更改它。但是,如果您在添加往返行程时不在应用程序中触发和遗忘,则需要降低性能。

    更新:从 2.6 版开始,命令insert, update, save,remove默认情况下确认写入。

  2. 从来没有听说过这样的问题,除了那些导致自己失败的问题……但这也可能发生在关系系统上。我猜这点只谈主从复制。副本集是永远不会和稳定的。来自网络的一些链接,其中其他 dbms 也因故障导致数据丢失:http: //blog.lastinfirstout.net/2010/04/bit-by-bug-data-loss-running-oracle-on.html http: //dbaspot.com/oracle-server/430465-parallel-cause-data-lost-another-oracle-bug.html http://bugs.mysql.com/bug.php?id=18014(那些发布的链接不是t 对给定系统有任何好处,或者应该暗示除了表明其他系统中也存在可能导致数据丢失的错误之外的任何其他内容。)

  3. 是的,实际上在实例级别有锁定,我不认为在分片环境中这是一个全局的,我认为这将在每个分片的实例级别单独进行,因为没有必要锁定其他实例,因为没有一致性检查需要。即将推出的 2.2 版将锁定在 DB 级别,集合级别的票证,可能还存在扩展或文档(https://jira.mongodb.org/browse/SERVER-4328)。但是更深层次的锁定可能会影响 MongoDB 的实际性能,因为锁定管理是昂贵的。

  4. 移动块应该不会引起太多问题,因为重新平衡应该从每个节点中取出几个块并将它们移动到新的。它永远不会导致块的 ping/pong,也不会因为一两个块的差异而开始重新平衡。可能有问题的是当您的分片键选择错误时。因此,您最终可能会将所有新条目插入一个节点而不是所有节点。因此,您会更频繁地看到可能导致问题的重新平衡,但这不是由于 mongo 而不是由于您选择不当的 shardkey。

  5. 无法对此发表评论

  6. 不是 100% 确定,但我认为 Replicasets 是在 1.6 中引入的,因此如前所述,永远不要将最新版本用于生产,除非您可以忍受数据丢失。与每个新功能一样,存在错误的可能性。即使是广泛的测试也可能无法揭示所有问题。再次始终运行一些手动备份看在上帝的份上,除非您可以忍受数据丢失。

  7. 无法对此发表评论。但实际上软件可能包含严重的错误。许多游戏也遇到了这些问题,并且还有其他领域,香蕉软件是众所周知的。无法评论具体的事情,因为这是在我的 MongoDB 时代之前。

  8. 复制可能会导致此类问题。根据复制策略,这可能是一个问题,另一个系统可能更适合。但是如果没有真正真正的写入密集型工作负载,您可能不会遇到此类问题。实际上,让 3 个副本轮询来自一个主节点的更改可能会出现问题。您可以通过添加更多分片来解决该问题。

As a general conclusion: Yeah it may be that those problems were existent, but MongoDB did much in this direction and further I doubt that other DBMS never had the one or other problem itself. Just take traditional relational dbms, would those scale well to web-scale there would be no need for Systems like MongoDB, HBase and what else. You can't get a system which fits all needs. So you have to live with the downsides of one or try to build a combined system of multiple to get what you need.

作为一般结论:是的,这些问题可能是存在的,但是 MongoDB 在这个方向上做了很多工作,而且我进一步怀疑其他 DBMS 本身从未遇到过一个或其他问题。仅采用传统的关系型数据库管理系统,这些数据库是否可以很好地扩展到网络规模,而无需像 MongoDB、HBase 等系统。您无法获得满足所有需求的系统。因此,您必须忍受一个的缺点,或者尝试构建多个组合的系统来获得您需要的东西。

Disclaimer: I'm not affiliated with MongoDB or 10gen, I'm just working with MongoDB and telling my opinion about it.

免责声明:我不隶属于 MongoDB 或 10gen,我只是与 MongoDB 合作并说出我的看法。