如何在 MongoDB 中执行等效的 SQL Join?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2350495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I perform the SQL Join equivalent in MongoDB?
提问by The Unknown
How do I perform the SQL Join equivalent in MongoDB?
如何在 MongoDB 中执行等效的 SQL Join?
For example say you have two collections (users and comments) and I want to pull all the comments with pid=444 along with the user info for each.
例如,假设您有两个集合(用户和评论),我想提取 pid=444 的所有评论以及每个集合的用户信息。
comments
{ uid:12345, pid:444, comment="blah" }
{ uid:12345, pid:888, comment="asdf" }
{ uid:99999, pid:444, comment="qwer" }
users
{ uid:12345, name:"john" }
{ uid:99999, name:"mia" }
Is there a way to pull all the comments with a certain field (eg. ...find({pid:444}) ) and the user information associated with each comment in one go?
有没有一种方法可以一次性提取具有特定字段的所有评论(例如 ...find({pid:444}) )以及与每个评论相关的用户信息?
At the moment, I am first getting the comments which match my criteria, then figuring out all the uid's in that result set, getting the user objects, and merging them with the comment's results. Seems like I am doing it wrong.
目前,我首先获取符合我的条件的评论,然后找出该结果集中的所有 uid,获取用户对象,并将它们与评论的结果合并。好像我做错了。
回答by Clayton Gulick
As of Mongo 3.2 the answers to this question are mostly no longer correct. The new $lookup operator added to the aggregation pipeline is essentially identical to a left outer join:
从 Mongo 3.2 开始,这个问题的答案大多不再正确。添加到聚合管道的新 $lookup 运算符本质上与左外连接相同:
https://docs.mongodb.org/master/reference/operator/aggregation/lookup/#pipe._S_lookup
https://docs.mongodb.org/master/reference/operator/aggregation/lookup/#pipe._S_lookup
From the docs:
从文档:
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
Of course Mongo is nota relational database, and the devs are being careful to recommend specific use cases for $lookup, but at least as of 3.2 doing join is now possible with MongoDB.
当然,Mongo不是关系数据库,开发人员正在谨慎地推荐 $lookup 的特定用例,但至少从 3.2 开始,MongoDB 现在可以进行连接。
回答by William Stein
This page on the official mongodb site addresses exactlythis question:
mongodb 官方网站上的这个页面正是解决了这个问题:
http://docs.mongodb.org/ecosystem/tutorial/model-data-for-ruby-on-rails/
http://docs.mongodb.org/ecosystem/tutorial/model-data-for-ruby-on-rails/
When we display our list of stories, we'll need to show the name of the user who posted the story. If we were using a relational database, we could perform a join on users and stores, and get all our objects in a single query. But MongoDB does not support joins and so, at times, requires bit of denormalization. Here, this means caching the 'username' attribute.
Relational purists may be feeling uneasy already, as if we were violating some universal law. But let's bear in mind that MongoDB collections are not equivalent to relational tables; each serves a unique design objective. A normalized table provides an atomic, isolated chunk of data. A document, however, more closely represents an object as a whole. In the case of a social news site, it can be argued that a username is intrinsic to the story being posted.
当我们显示故事列表时,我们需要显示发布故事的用户的姓名。如果我们使用关系数据库,我们可以对用户和存储执行连接,并在单个查询中获取所有对象。但是 MongoDB 不支持连接,因此有时需要一些非规范化。在这里,这意味着缓存“用户名”属性。
关系纯粹主义者可能已经感到不安,好像我们违反了某些普遍规律。但让我们记住,MongoDB 集合并不等同于关系表;每一个都有一个独特的设计目标。规范化表提供原子的、隔离的数据块。然而,文档更接近地将对象表示为一个整体。在社交新闻网站的情况下,可以说用户名是发布的故事所固有的。
回答by Orlando Becerra
We can merge/join all data inside only one collection with a easy function in few lines using the mongodb client console, and now we could be able of perform the desired query. Below a complete example,
我们可以使用 mongodb 客户端控制台通过几行简单的函数合并/加入一个集合中的所有数据,现在我们可以执行所需的查询。下面是一个完整的例子,
.- Authors:
.- 作者:
db.authors.insert([
{
_id: 'a1',
name: { first: 'orlando', last: 'becerra' },
age: 27
},
{
_id: 'a2',
name: { first: 'mayra', last: 'sanchez' },
age: 21
}
]);
.- Categories:
.- 类别:
db.categories.insert([
{
_id: 'c1',
name: 'sci-fi'
},
{
_id: 'c2',
name: 'romance'
}
]);
.- Books
.- 书籍
db.books.insert([
{
_id: 'b1',
name: 'Groovy Book',
category: 'c1',
authors: ['a1']
},
{
_id: 'b2',
name: 'Java Book',
category: 'c2',
authors: ['a1','a2']
},
]);
.- Book lending
.- 图书借阅
db.lendings.insert([
{
_id: 'l1',
book: 'b1',
date: new Date('01/01/11'),
lendingBy: 'jose'
},
{
_id: 'l2',
book: 'b1',
date: new Date('02/02/12'),
lendingBy: 'maria'
}
]);
.- The magic:
。- 魔法:
db.books.find().forEach(
function (newBook) {
newBook.category = db.categories.findOne( { "_id": newBook.category } );
newBook.lendings = db.lendings.find( { "book": newBook._id } ).toArray();
newBook.authors = db.authors.find( { "_id": { $in: newBook.authors } } ).toArray();
db.booksReloaded.insert(newBook);
}
);
.- Get the new collection data:
.- 获取新的集合数据:
db.booksReloaded.find().pretty()
.- Response :)
。- 回复 :)
{
"_id" : "b1",
"name" : "Groovy Book",
"category" : {
"_id" : "c1",
"name" : "sci-fi"
},
"authors" : [
{
"_id" : "a1",
"name" : {
"first" : "orlando",
"last" : "becerra"
},
"age" : 27
}
],
"lendings" : [
{
"_id" : "l1",
"book" : "b1",
"date" : ISODate("2011-01-01T00:00:00Z"),
"lendingBy" : "jose"
},
{
"_id" : "l2",
"book" : "b1",
"date" : ISODate("2012-02-02T00:00:00Z"),
"lendingBy" : "maria"
}
]
}
{
"_id" : "b2",
"name" : "Java Book",
"category" : {
"_id" : "c2",
"name" : "romance"
},
"authors" : [
{
"_id" : "a1",
"name" : {
"first" : "orlando",
"last" : "becerra"
},
"age" : 27
},
{
"_id" : "a2",
"name" : {
"first" : "mayra",
"last" : "sanchez"
},
"age" : 21
}
],
"lendings" : [ ]
}
I hope this lines can help you.
我希望这条线可以帮助你。
回答by Otto Allmendinger
You have to do it the way you described. MongoDB is a non-relational database and doesn't support joins.
你必须按照你描述的方式来做。MongoDB 是非关系型数据库,不支持连接。
回答by grepit
As others have pointed out you are trying to create a relational database from none relational database which you really don't want to do but anyways, if you have a case that you have to do this here is a solution you can use. We first do a foreach find on collection A( or in your case users) and then we get each item as an object then we use object property (in your case uid) to lookup in our second collection (in your case comments) if we can find it then we have a match and we can print or do something with it. Hope this helps you and good luck :)
正如其他人指出的那样,您正在尝试从非关系数据库创建一个关系数据库,而您确实不想这样做,但无论如何,如果您有必须这样做的情况,这里是您可以使用的解决方案。我们首先在集合 A(或在您的情况下为用户)上进行 foreach 查找,然后我们将每个项目作为一个对象,然后我们使用对象属性(在您的情况下为 uid)来查找我们的第二个集合(在您的情况下为评论),如果我们可以找到它然后我们有一个匹配,我们可以打印或用它做一些事情。希望这对你有帮助,祝你好运:)
db.users.find().forEach(
function (object) {
var commonInBoth=db.comments.findOne({ "uid": object.uid} );
if (commonInBoth != null) {
printjson(commonInBoth) ;
printjson(object) ;
}else {
// did not match so we don't care in this case
}
});
回答by sbharti
With right combination of $lookup, $projectand $match, you can join mutiple tables on multiple parameters. This is because they can be chained multiple times.
通过$lookup、$project和$match 的正确组合,您可以连接多个参数的多个表。这是因为它们可以被链接多次。
Suppose we want to do following (reference)
假设我们要执行以下操作(参考)
SELECT S.* FROM LeftTable S
LEFT JOIN RightTable R ON S.ID =R.ID AND S.MID =R.MID WHERE R.TIM >0 AND
S.MOB IS NOT NULL
Step 1: Link all tables
第 1 步:链接所有表
you can $lookup as many tables as you want.
您可以根据需要查找任意数量的表。
$lookup- one for each table in query
$lookup- 查询中的每个表一个
$unwind- because data is denormalised correctly, else wrapped in arrays
$unwind- 因为数据被正确非规范化,否则包裹在数组中
Python code..
Python代码..
db.LeftTable.aggregate([
# connect all tables
{"$lookup": {
"from": "RightTable",
"localField": "ID",
"foreignField": "ID",
"as": "R"
}},
{"$unwind": "R"}
])
Step 2: Define all conditionals
第 2 步:定义所有条件
$project: define all conditional statements here, plus all the variables you'd like to select.
$project:在此处定义所有条件语句,以及您要选择的所有变量。
Python Code..
Python代码..
db.LeftTable.aggregate([
# connect all tables
{"$lookup": {
"from": "RightTable",
"localField": "ID",
"foreignField": "ID",
"as": "R"
}},
{"$unwind": "R"},
# define conditionals + variables
{"$project": {
"midEq": {"$eq": ["$MID", "$R.MID"]},
"ID": 1, "MOB": 1, "MID": 1
}}
])
Step 3: Join all the conditionals
第 3 步:加入所有条件
$match- join all conditions using OR or AND etc. There can be multiples of these.
$match- 使用 OR 或 AND 等连接所有条件。这些条件可以有多个。
$project: undefine all conditionals
$project: 取消定义所有条件
Python Code..
Python代码..
db.LeftTable.aggregate([
# connect all tables
{"$lookup": {
"from": "RightTable",
"localField": "ID",
"foreignField": "ID",
"as": "R"
}},
{"$unwind": "$R"},
# define conditionals + variables
{"$project": {
"midEq": {"$eq": ["$MID", "$R.MID"]},
"ID": 1, "MOB": 1, "MID": 1
}},
# join all conditionals
{"$match": {
"$and": [
{"R.TIM": {"$gt": 0}},
{"MOB": {"$exists": True}},
{"midEq": {"$eq": True}}
]}},
# undefine conditionals
{"$project": {
"midEq": 0
}}
])
Pretty much any combination of tables, conditionals and joins can be done in this manner.
几乎任何表、条件和连接的组合都可以通过这种方式完成。
回答by antitoxic
Here's an example of a "join"* Actorsand Moviescollections:
这是“加入”* Actors和Movies集合的示例:
https://github.com/mongodb/cookbook/blob/master/content/patterns/pivot.txt
https://github.com/mongodb/cookbook/blob/master/content/patterns/pivot.txt
It makes use of .mapReduce()
method
它利用.mapReduce()
方法
* join- an alternative to join in document-oriented databases
* join- 加入面向文档的数据库的替代方法
回答by jarry jafery
You can join two collection in Mongo by using lookup which is offered in 3.2 version. In your case the query would be
您可以使用 3.2 版本中提供的查找在 Mongo 中加入两个集合。在您的情况下,查询将是
db.comments.aggregate({
$lookup:{
from:"users",
localField:"uid",
foreignField:"uid",
as:"users_comments"
}
})
or you can also join with respect to users then there will be a little change as given below.
或者您也可以针对用户加入,那么将会有如下所示的一些变化。
db.users.aggregate({
$lookup:{
from:"comments",
localField:"uid",
foreignField:"uid",
as:"users_comments"
}
})
It will work just same as left and right join in SQL.
它的工作方式与 SQL 中的左连接和右连接相同。
回答by Pickels
There is a specification that a lot of drivers support that's called DBRef.
许多驱动程序都支持一种称为 DBRef 的规范。
DBRef is a more formal specification for creating references between documents. DBRefs (generally) include a collection name as well as an object id. Most developers only use DBRefs if the collection can change from one document to the next. If your referenced collection will always be the same, the manual references outlined above are more efficient.
DBRef 是一种更正式的规范,用于在文档之间创建引用。DBRefs(通常)包括集合名称和对象 ID。大多数开发人员仅在集合可以从一个文档更改为下一个文档时才使用 DBRef。如果您引用的集合始终相同,则上面概述的手动引用会更有效。
Taken from MongoDB Documentation: Data Models > Data Model Reference > Database References
取自 MongoDB 文档:数据模型 > 数据模型参考 > 数据库参考
回答by Snowburnt
It depends on what you're trying to do.
这取决于你想要做什么。
You currently have it set up as a normalized database, which is fine, and the way you are doing it is appropriate.
您目前已将其设置为规范化数据库,这很好,而且您的操作方式也很合适。
However, there are other ways of doing it.
但是,还有其他方法可以做到。
You could have a posts collection that has imbedded comments for each post with references to the users that you can iteratively query to get. You could store the user's name with the comments, you could store them all in one document.
您可以拥有一个帖子集合,该集合为每个帖子嵌入评论,并引用您可以迭代查询获取的用户。您可以将用户名与评论一起存储,也可以将它们全部存储在一个文档中。
The thing with NoSQL is it's designed for flexible schemas and very fast reading and writing. In a typical Big Data farm the database is the biggest bottleneck, you have fewer database engines than you do application and front end servers...they're more expensive but more powerful, also hard drive space is very cheap comparatively. Normalization comes from the concept of trying to save space, but it comes with a cost at making your databases perform complicated Joins and verifying the integrity of relationships, performing cascading operations. All of which saves the developers some headaches if they designed the database properly.
NoSQL 的特点是它专为灵活的模式和非常快速的读写而设计。在典型的大数据场中,数据库是最大的瓶颈,数据库引擎比应用程序和前端服务器少……它们更贵但功能更强大,而且硬盘空间相对来说非常便宜。规范化来自试图节省空间的概念,但它带来了使您的数据库执行复杂的联接和验证关系的完整性、执行级联操作的成本。如果开发人员正确设计数据库,所有这些都会为开发人员省去一些麻烦。
With NoSQL, if you accept that redundancy and storage space aren't issues because of their cost (both in processor time required to do updates and hard drive costs to store extra data), denormalizing isn't an issue (for embedded arrays that become hundreds of thousands of items it can be a performance issue, but most of the time that's not a problem). Additionally you'll have several application and front end servers for every database cluster. Have them do the heavy lifting of the joins and let the database servers stick to reading and writing.
使用 NoSQL,如果您接受冗余和存储空间不是问题,因为它们的成本(在执行更新所需的处理器时间和存储额外数据的硬盘成本方面),非规范化不是问题(对于成为数十万个项目可能是性能问题,但大多数时候这不是问题)。此外,您将为每个数据库集群提供多个应用程序和前端服务器。让他们完成连接的繁重工作,让数据库服务器坚持读取和写入。
TL;DR: What you're doing is fine, and there are other ways of doing it. Check out the mongodb documentation's data model patterns for some great examples. http://docs.mongodb.org/manual/data-modeling/
TL;DR:你所做的很好,还有其他方法可以做到。查看 mongodb 文档的数据模型模式以获取一些很好的示例。http://docs.mongodb.org/manual/data-modeling/