合并 MongoDB 中的两个集合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9696940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging two collections in MongoDB
提问by TFX
I've been trying to use MapReduce in MongoDB to do what I think is a simple procedure. I don't know if this is the right approach, of if I should even be using MapReduce. I googled what keywords I thought of and tried to hit the docs where I thought I would have the most success - but nothing. Maybe I'm thinking too hard about this?
我一直在尝试在 MongoDB 中使用 MapReduce 来做我认为是一个简单的过程。我不知道这是否是正确的方法,不知道我是否应该使用 MapReduce。我在谷歌上搜索了我想到的关键字,并试图找到我认为我会取得最大成功的文档 - 但什么也没有。可能是我想太多了?
I have two collections: details
and gpas
我有两个集合:details
和gpas
details
is made up of a whole bunch of documents (3+ million). The studentid
element can be repeated two times, one for each year
, like the following:
details
由一大堆文件组成(3+百万)。所述studentid
元件可以被重复两次,每个year
,如下所示:
{ "_id" : ObjectId("4d49b7yah5b6d8372v640100"), "classes" : [1,17,19,21], "studentid" : "12345a", "year" : 1}
{ "_id" : ObjectId("4d76b7oij7s2d8372v640100"), "classes" : [2,12,19,22], "studentid" : "98765a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8372v640100"), "classes" : [32,91,101,217], "studentid" : "12345a", "year" : 2}
{ "_id" : ObjectId("4d76b7rty7s2d8372v640100"), "classes" : [1,11,18,22], "studentid" : "24680a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8856v640100"), "classes" : [32,99,110,215], "studentid" : "98765a", "year" : 2}
...
gpas
has elements with the same studentid
's from details
. Only one entry per studentid
, like this:
gpas
具有与studentid
from相同的元素details
。每个 仅一个条目studentid
,如下所示:
{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "overall" : 76, "subscore": 2}
...
In the end I want to have a collection with one row for each student in this format:
最后,我希望以这种格式为每个学生创建一个包含一行的集合:
{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "classes_1": [1,17,19,21], "classes_2": [32,91,101,217], "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "classes_1": [2,12,19,22], "classes_2": [32,99,110,215], "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "classes_1": [1,11,18,22], "classes_2": [], "overall" : 76, "subscore": 2}
...
The way I was going to do this was by running MapReduce like this:
我打算这样做的方法是像这样运行 MapReduce:
var mapDetails = function() {
emit(this.studentid, {studentid: this.studentid, classes: this.classes, year: this.year, overall: 0, subscore: 0});
};
var mapGpas = function() {
emit(this.studentid, {studentid: this.studentid, classes: [], year: 0, overall: this.overall, subscore: this.subscore});
};
var reduce = function(key, values) {
var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};
values.forEach(function(value) {
if (value.year == 0) {
outs.overall = value.overall;
outs.subscore = value.subscore;
}
else {
if (value.year == 1) {
outs.classes_1 = value.classes;
}
if (value.year == 2) {
outs.classes_2 = value.classes;
}
outs.studentid = value.studentid;
}
});
return outs;
};
res = db.details.mapReduce(mapDetails, reduce, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, reduce, {out: {reduce: 'joined'}})
But when I run it, this is my resulting collection:
但是当我运行它时,这是我的结果集合:
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 85, "subscore" : 5 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
I'm missing the classes arrays.
我缺少类数组。
Also, as an aside, how do I access the elements in resulting MapReduce value
element? Does MapReduce always output to value
or whatever else you name it?
另外,顺便说一句,如何访问生成的 MapReducevalue
元素中的元素?MapReduce 是否总是输出到value
或您命名的任何其他内容?
回答by Marc
This is similar to a question that was asked on the MongoDB-users Google Groups.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1
这类似于在 MongoDB-users Google Groups 上提出的问题。
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1
The answer references an on-line tutorial which looks similar to your example: http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
答案引用了一个与您的示例相似的在线教程:http: //tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
For more information on MapReduce in MongoDB, please see the documentation: http://www.mongodb.org/display/DOCS/MapReduce
有关 MongoDB 中 MapReduce 的更多信息,请参阅文档:http: //www.mongodb.org/display/DOCS/MapReduce
Additionally, there is a useful step-by-step walkthrough of how a MapReduce operation works in the "Extras" Section of the MongoDB Cookbook article titled, "Finding Max And Min Values with Versioned Documents": http://cookbook.mongodb.org/patterns/finding_max_and_min/
此外,在题为“使用版本化文档查找最大值和最小值”的 MongoDB Cookbook 文章的“Extras”部分中,有一个关于 MapReduce 操作如何工作的有用分步演练: http://cookbook.mongodb。组织/模式/finding_max_and_min/
Forgive me if you have already read some of the referenced documents. I have included them for the benefit of other users who may be reading this post and new to using MapReduce in MongoDB
如果您已经阅读了一些参考文件,请见谅。我将它们包含在内是为了其他可能正在阅读这篇文章并且不熟悉在 MongoDB 中使用 MapReduce 的用户的利益
It is important that the outputs from the 'emit' statements in the Map functions match the outputs of the Reduce function. If there is only one document output by the Map function, the Reduce function might not be run at all, and then your output collection will have mismatched documents.
Map 函数中“emit”语句的输出与 Reduce 函数的输出匹配很重要。如果 Map 函数输出的文档只有一个,则 Reduce 函数可能根本不会运行,然后您的输出集合就会出现不匹配的文档。
I have slightly modified your map statements to emit documents in the format of your desired output, with two separate "classes" arrays.
I have also reworked your reduce statement to add new classes to the classes_1 and classes_2 arrays, only if they do not already exist.
我稍微修改了您的 map 语句,以您想要的输出格式发出文档,并带有两个单独的“类”数组。
我还修改了您的 reduce 语句,以将新类添加到 classes_1 和 classes_2 数组,前提是它们尚不存在。
var mapDetails = function(){
var output = {studentid: this.studentid, classes_1: [], classes_2: [], year: this.year, overall: 0, subscore: 0}
if (this.year == 1) {
output.classes_1 = this.classes;
}
if (this.year == 2) {
output.classes_2 = this.classes;
}
emit(this.studentid, output);
};
var mapGpas = function() {
emit(this.studentid, {studentid: this.studentid, classes_1: [], classes_2: [], year: 0, overall: this.overall, subscore: this.subscore});
};
var r = function(key, values) {
var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};
values.forEach(function(v){
outs.studentid = v.studentid;
v.classes_1.forEach(function(class){if(outs.classes_1.indexOf(class)==-1){outs.classes_1.push(class)}})
v.classes_2.forEach(function(class){if(outs.classes_2.indexOf(class)==-1){outs.classes_2.push(class)}})
if (v.year == 0) {
outs.overall = v.overall;
outs.subscore = v.subscore;
}
});
return outs;
};
res = db.details.mapReduce(mapDetails, r, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, r, {out: {reduce: 'joined'}})
Running the two MapReduce operations results in the following collection, which matches your desired format:
运行这两个 MapReduce 操作会产生以下集合,它与您想要的格式相匹配:
> db.joined.find()
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ 1, 17, 19, 21 ], "classes_2" : [ 32, 91, 101, 217 ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ 1, 11, 18, 22 ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ 2, 12, 19, 22 ], "classes_2" : [ 32, 99, 110, 215 ], "overall" : 85, "subscore" : 5 } }
>
MapReduce always outputs documents in the form of {_id:"id", value:"value"} There is more information available on working with sub-documents in the document titled, "Dot Notation (Reaching into Objects)": http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
MapReduce 总是以 {_id:"id", value:"value"} 的形式输出文档在标题为“Dot Notation (Reaching into Objects)”的文档中有更多关于使用子文档的信息: http:/ /www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
If you would like the output of MapReduce to appear in a different format, you will have to do that programmatically in your application.
如果您希望 MapReduce 的输出以不同的格式显示,则必须在应用程序中以编程方式执行此操作。
Hopefully this will improve your understanding of MapReduce, and get you one step closer to producing your desired output collection. Good Luck!
希望这将提高您对 MapReduce 的理解,并使您更接近生成所需的输出集合。祝你好运!
回答by Remon van Vliet
You cannot use m/r for this since that is designed to only apply on one collection. Reading from more than one collection will break sharding compatibility and is therefore not allowed. You can do what you want with either the new aggregation framework (2.1+) or do this inside your application.
您不能为此使用 m/r,因为它旨在仅应用于一个集合。从多个集合中读取会破坏分片兼容性,因此是不允许的。您可以使用新的聚合框架 (2.1+) 执行您想要的操作,也可以在您的应用程序中执行此操作。