windows MongoDB 与 CouchDB(速度优化)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2954957/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB vs CouchDB (Speed optimization)
提问by Edward83
I made some tests of speed to compare MongoDB and CouchDB. Only inserts were while testing. I got MongoDB 15x faster than CouchDB. I know that it is because of sockets vs http. But, it is very interesting for me how can I optimize inserts in CouchDB?
我做了一些速度测试来比较 MongoDB 和 CouchDB。测试时只有插入。我得到的 MongoDB 比 CouchDB 快 15 倍。我知道这是因为套接字与 http。但是,对我来说如何优化 CouchDB 中的插入很有趣?
Test platform: Windows XP SP3 32 bit. I used last versions of MongoDB, MongoDB C# Driver and last version of installation packageof CouchDB for Windows.
测试平台:Windows XP SP3 32 位。我使用了最新版本的 MongoDB、MongoDB C# 驱动程序和最新版本的 CouchDB for Windows安装包。
Thanks!
谢谢!
采纳答案by JasonSmith
For inserting lots of data into the DB in bulk fashion, CouchDB supports bulk insertswhich are described in the wiki under HTTP Bulk Document API.
为了以批量方式向数据库中插入大量数据,CouchDB 支持批量插入,这些插入在 wiki 中的HTTP Bulk Document API下进行了描述。
Additionally, check out the delayed_commits
configuration option, and the batch=ok
option described in the above link. Those options enable similar memory-caching behavior with periodic syncing against he disk.
此外,请查看delayed_commits
配置选项以及batch=ok
上述链接中描述的选项。这些选项通过定期同步磁盘来启用类似的内存缓存行为。
回答by mikeal
Just to iterate on the sockets vs HTTP and fsync vs in-memory conversation.
只是为了迭代套接字与 HTTP 和 fsync 与内存中的对话。
By default, MongoDB doesn't return a response on a write call. You just write your data to the socket and assume it's in the DB and available. Under concurrent load this could get backed up and there isn't a good way to know how fast Mongo reallyis unless you use an optional call that will return a response for the write once the data is available.
默认情况下,MongoDB 不会在写入调用时返回响应。您只需将数据写入套接字并假设它在数据库中并且可用。在并发负载下,这可能会得到备份,除非您使用可选调用,一旦数据可用,该调用将返回对写入的响应,否则没有一种很好的方法可以了解 Mongo 的实际速度。
I'm not saying Mongo insert performance isn't faster than Couch, inserting in to memory is a lot faster than fsyncing to disc, the bigger difference here is in the difference in goals MongoDB and CouchDB have about consistency and durability. But all the "performance" tools I've seen for testing Mongo use the default write API so you aren't really testing insert performance you're testing how fast you can flush to a socket.
我并不是说 Mongo 插入性能不比 Couch 快,插入内存比 fsync 到磁盘快很多,这里更大的区别在于 MongoDB 和 CouchDB 在一致性和持久性方面的目标不同。但是我见过的用于测试 Mongo 的所有“性能”工具都使用默认的写入 API,因此您实际上并没有测试插入性能,而是测试刷新到套接字的速度。
I've seen a lot of benchmarks that show Mongo as faster than Redis and memcached because they fail to realize that Redis and Memcached return a response when the data is in memory and Mongo does not. Mongo definitely is notfaster than Redis :)
我见过很多基准测试表明 Mongo 比 Redis 和 memcached 更快,因为他们没有意识到当数据在内存中而 Mongo 没有时,Redis 和 Memcached 会返回响应。蒙戈绝对不是快于Redis的:)
回答by TTT
I don't think that the difference between sockets and http is the only difference. The difference is also related to the disk syncs (fsync). This affects durability. MongoDB stores everything in RAM first and it only syncs to disk at certain intervals, unless you explicitly tell MongoDB to do an fsync.
我不认为套接字和 http 之间的区别是唯一的区别。差异还与磁盘同步 (fsync) 有关。这会影响耐用性。MongoDB 首先将所有内容存储在 RAM 中,并且它只会在特定时间间隔同步到磁盘,除非您明确告诉 MongoDB 执行 fsync。
Read about durability and MongoDB: http://blog.mongodb.org/post/381927266/what-about-durabilityand fsync: http://www.mongodb.org/display/DOCS/fsync+Command
阅读持久性和 MongoDB:http: //blog.mongodb.org/post/381927266/what-about-durability和 fsync:http: //www.mongodb.org/display/DOCS/fsync+Command
回答by JasonSmith
Here is an idea I have been thinking about but have not benchmarked. I expect it to be great at certain situations:
这是我一直在考虑但尚未进行基准测试的想法。我希望它在某些情况下会很棒:
- Insert throughput must be high
- Fetching individual documents by key is not required
- All data is fetched through views (possibly a different machine from the one receiving the inserts)
- 插入吞吐量必须高
- 不需要通过密钥获取单个文档
- 所有数据都通过视图获取(可能与接收插入的机器不同)
The Plan
计划
Insert batches-of-batches of documents, and use views to serialize them back out nicely.
批量插入文档,并使用视图将它们很好地序列化。
Example
例子
Consider a log file with a simple timestamp and message string.
考虑一个带有简单时间戳和消息字符串的日志文件。
0.001 Start
0.123 This could be any message
0.500 Half a second later!
1.000 One second has gone by
2.000 Two seconds has gone by
[...]
1000.000 One thousand seconds has gone by
You might insert logs one message per document, e.g.:
您可以为每个文档插入一条消息日志,例如:
{ "_id": "f30d09ef6a9e405994f13a38a44ee4a1",
"_rev": "1-764efa883dda1e11db47671c4a3bbd9e",
"timestamp": 0.123,
"message": "This could be any message"
}
The standard bulk docs optimization
标准的批量文档优化
The first optimization is insert using _bulk_docs
as in the CouchDB bulk-docs documentation.
第一个优化是插入使用,_bulk_docs
如CouchDB 批量文档文档中所述。
A secondary bulk insert optimization
二次批量插入优化
However, a secondoptimization is to pre-batch the logs into one larger Couch document. For example, in batches of 4 (in the real world this would be much higher):
然而,第二个优化是将日志预批处理到一个更大的 Couch 文档中。例如,以 4 为一组(在现实世界中,这会高得多):
{ "_id": "f30d09ef6a9e405994f13a38a44ee4a1",
"_rev": "1-764efa883dda1e11db47671c4a3bbd9e",
"logs": [
{"timestamp": 0.001, "message": "Start"},
{"timestamp": 0.123, "message": "This could be any message"},
{"timestamp": 0.500, "message": "Half a second later!"},
{"timestamp": 1.000, "message": "One second has gone by"}
]
}
{ "_id": "74f615379d98d3c3d4b3f3d4ddce82f8",
"_rev": "1-ea4f43014d555add711ea006efe782da",
"logs": [
{"timestamp": 2.000, "message": "Two seconds has gone by"},
{"timestamp": 3.000, "message": "Three seconds has gone by"},
{"timestamp": 4.000, "message": "Four seconds has gone by"},
{"timestamp": 5.000, "message": "Five seconds has gone by"},
]
}
Of course, you would insert these via _bulk_docs
as well, effectively inserting batches of batches of data.
当然,您也可以通过插入这些方式_bulk_docs
,有效地插入成批的数据。
Views are still very easy
景色还是很轻松的
It is still very easy to serialize the logs back out into a view:
将日志序列化回视图仍然很容易:
// map
function(doc) {
if(doc.logs) {
// Just unroll the log batches!
for (var i in doc.logs) {
var log = doc.logs[i];
emit(log.timestamp, log.message);
}
}
}
It will then be quite easy to fetch logs with timestamps between startkey
, endkey
, or whatever other needs you have.
那么这将是很容易与时间戳之间获取日志startkey
,endkey
你有什么其他的需求,或。
Conclusion
结论
This is still not benchmarked, but my hope is that, for some kinds of data, batching into clumps will reduce the internal B-tree writes. Combined with _bulk_docs
, I hope to see insert throughput hit the hardware speeds of the disk.
这仍然没有进行基准测试,但我希望,对于某些类型的数据,批处理将减少内部 B 树写入。结合_bulk_docs
,我希望看到插入吞吐量达到磁盘的硬件速度。