mongodb Mongo - 如何聚合、过滤和包含匹配文档中的数据数组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17731104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mongo - How can I aggregate, filter, and include an array of data from the matching documents?
提问by Marcel Chastain
I have a mongo-backed contact database going and I'm trying to find duplicate entries in a bunch of different ways.
我有一个 mongo 支持的联系人数据库,我正在尝试以多种不同的方式查找重复的条目。
For example, if 2 contacts have the same phone number they are flagged as a possible duplicate, ditto for email, etc.
例如,如果 2 个联系人具有相同的电话号码,则他们将被标记为可能重复,电子邮件同上等。
I'm using MongoDB 2.4.2 on Debian with pyMongo and MongoEngine.
我在带有 pyMongo 和 MongoEngine 的 Debian 上使用 MongoDB 2.4.2。
The closest I have so far is finding and counting records that contain the same phone number:
到目前为止,我最接近的是查找和计算包含相同电话号码的记录:
dbh.person_document.aggregate([
{'$unwind': '$phones'},
{'$group': {'_id': '$phones', 'count': {'$sum': 1}}},
{'$sort': SON([('count', -1), ('_id', -1)])}
])
# Results in
{u'ok': 1.0,
u'result': [{u'_id': {u'number': u'404-231-4444', u'showroom_id': 5}, u'count': 5},
{u'_id': {u'number': u'205-265-6666', u'showroom_id': 5}, u'count': 5},
{u'_id': {u'number': u'213-785-7777', u'showroom_id': 5}, u'count': 4},
{u'_id': {u'number': u'334-821-9999', u'showroom_id': 5}, u'count': 3}
]}
So I can get the numbers that are duplicates, but I can't for the life of me figure out how to return an array of the Documents that actually contained these items!
所以我可以得到重复的数字,但我一生都无法弄清楚如何返回实际包含这些项目的文档数组!
I wanna see this kind of return data for each number:
我想看到每个数字的这种返回数据:
# The ObjectIDs of the documents that contained the duplicate phone numbers
{u'_id': {u'number': u'404-231-4444', u'showroom_id': 5},
u'ids': [ObjectId('51c67e322b2192121ec4d8f2'), ObjectId('51c67e312b2192121ec4d8f0')],
u'count': 2},
Any help is greatly appreciated!
任何帮助是极大的赞赏!
回答by Marcel Chastain
Ah, blessed be.
啊,有福了。
Found the solution almost word for word at MongoDB - Use aggregation framework or mapreduce for matching array of strings within documents (profile matching).
在MongoDB 上几乎一字不差地找到了解决方案- 使用聚合框架或 mapreduce 匹配文档中的字符串数组(配置文件匹配)。
Final result, adding some extra to include the name:
最终结果,添加一些额外内容以包含名称:
dbh.person_document.aggregate([
{'$unwind': '$phones'},
{'$group': {
'_id': '$phones',
'matchedDocuments': {
'$push':{
'id': '$_id',
'name': '$full_name'
}},
'num': { '$sum': 1}
}},
{'$match':{'num': {'$gt': 1}}}
])