如何在 MongoDB 的 $match 中使用聚合运算符(例如 $year 或 $dayOfMonth)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12694490/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:52:02  来源:igfitidea点击:

How do I use aggregation operators in a $match in MongoDB (for example $year or $dayOfMonth)?

mongodbaggregation-framework

提问by Mason

I have a collection full of documents with a created_date attribute. I'd like to send these documents through an aggregation pipeline to do some work on them. Ideally I would like to filter them using a $match before I do any other work on them so that I can take advantage of indexes however I can't figure out how to use the new $year/$month/$dayOfMonth operators in my $match expression.

我有一个包含 created_date 属性的文档集合。我想通过聚合管道发送这些文档以对其进行一些工作。理想情况下,我想在对它们进行任何其他工作之前使用 $match 过滤它们,以便我可以利用索引但是我无法弄清楚如何在我的中使用新的 $year/$month/$dayOfMonth 运算符$match 表达式。

There are a few examples floating around of how to use the operators in a $project operation but I'm concerned that by placing a $project as the first step in my pipeline then I've lost access to my indexes (MongoDB documentation indicates that the first expression must be a $match to take advantage of indexes).

有一些关于如何在 $project 操作中使用运算符的示例,但我担心通过将 $project 作为我管道中的第一步然后我无法访问我的索引(MongoDB 文档表明第一个表达式必须是 $match 才能利用索引)。

Sample data:

样本数据:

{
    post_body: 'This is the body of test post 1',
    created_date: ISODate('2012-09-29T05:23:41Z')
    comments: 48
}
{
    post_body: 'This is the body of test post 2',
    created_date: ISODate('2012-09-24T12:34:13Z')
    comments: 10
}
{
    post_body: 'This is the body of test post 3',
    created_date: ISODate('2012-08-16T12:34:13Z')
    comments: 10
}

I'd like to run this through an aggregation pipeline to get the total comments on all posts made in September

我想通过聚合管道运行此操作,以获取 9 月所有帖子的总评论数

{
    aggregate: 'posts',
    pipeline: [
         {$match:
             /*Can I use the $year/$month operators here to match Sept 2012?
             $year:created_date : 2012,
             $month:created_date : 9
             */
             /*or does this have to be 
             created_date : 
                  {$gte:{$date:'2012-09-01T04:00:00Z'}, 
                  $lt: {$date:'2012-10-01T04:00:00Z'} }
             */
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

This works but the match loses access to any indexes for more complicated queries:

这有效,但匹配失去了对更复杂查询的任何索引的访问权限:

{
    aggregate: 'posts',
    pipeline: [
         {$project:
              {
                   month : {$month:'$created_date'},
                   year : {$year:'$created_date'}
              }
         },
         {$match:
              {
                   month:9,
                   year: 2012
               }
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

回答by Asya Kamsky

As you already found, you cannot $match on fields that are not in the document (it works exactly the same way that find works) and if you use $project first then you will lose the ability to use indexes.

正如您已经发现的,您不能 $match 不在文档中的字段(它的工作方式与 find 的工作方式完全相同),如果您先使用 $project ,那么您将失去使用索引的能力。

What you can do instead is combine your efforts as follows:

你可以做的是结合你的努力如下:

{
    aggregate: 'posts',
    pipeline: [
         {$match: {
             created_date : 
                  {$gte:{$date:'2012-09-01T04:00:00Z'}, 
                  $lt:  {date:'2012-10-01T04:00:00Z'} 
                  }}
             }
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

The above only gives you aggregation for September, if you wanted to aggregate for multiple months, you can for example:

上面只给你九月的聚合,如果你想聚合多个月,你可以例如:

{
    aggregate: 'posts',
    pipeline: [
         {$match: {
             created_date : 
                  { $gte:'2012-07-01T04:00:00Z', 
                    $lt: '2012-10-01T04:00:00Z'
                  }
         },
         {$project: {
              comments: 1,
              new_created: {
                        "yr" : {"$year" : "$created_date"},
                        "mo" : {"$month" : "$created_date"}
                     }
              }
         },
         {$group:
             {_id: "$new_created",
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

and you'll get back something like:

你会得到类似的东西:

{
    "result" : [
        {
            "_id" : {
                "yr" : 2012,
                "mo" : 7
            },
            "totalComments" : 5
        },
        {
            "_id" : {
                "yr" : 2012,
                "mo" : 8
            },
            "totalComments" : 19
        },
        {
            "_id" : {
                "yr" : 2012,
                "mo" : 9
            },
            "totalComments" : 21
        }
    ],
    "ok" : 1
}

回答by xameeramir

Let's look at building some pipelines that involve operations that are already familiar to us. So, we're going to look at the following stages:

让我们来看看构建一些涉及我们已经熟悉的操作的管道。因此,我们将研究以下阶段:

  • match- this is filtering stage, similar to find.
  • project
  • sort
  • skip
  • limit
  • match- 这是过滤阶段,类似于find.
  • project
  • sort
  • skip
  • limit

We might ask ourself why these stages are necessary, given that this functionality is already provided in the MongoDBquery language, and the reason is because we need these stages to support the more complex analytics-oriented functionality that's included with the aggregation framework. The below query is simply equal to a find:

我们可能会问自己为什么MongoDB需要这些阶段,因为查询语言中已经提供了此功能,原因是我们需要这些阶段来支持聚合框架中包含的更复杂的面向分析的功能。下面的查询只是等于 a find


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, ])

Let's introduce a project stage in this aggregation pipeline:

让我们在此聚合管道中引入一个项目阶段:


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, {
  $project: {
    _id: 0,
    name: 1,
    founded_year: 1
  }
}])

We use aggregatemethod for implementing aggregation framework. The aggregation pipelines are merely an array of documents. Each of the document should stipulate a particular stage operator. So, in the above case we've an aggregation pipeline with twostages. The $matchstage is passing the documents one at a time to $projectstage.

我们使用aggregate方法来实现聚合框架。聚合管道只是一个文档数组。每个文件都应规定一个特定的阶段操作者。因此,在上述情况下,我们有一个具有两个阶段的聚合管道。该$match阶段将文件一次一个地传递到$project阶段。

Let's extend to limitstage:

让我们扩展到limit阶段:


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, {
  $limit: 5
}, {
  $project: {
    _id: 0,
    name: 1
  }
}])

This gets the matchingdocuments and limits to fivebefore projecting out the fields. So, projection is working only on 5documents. Assume, if we were to do something like this:

这会在投影字段之前获取匹配的文档并限制为五个。因此,投影仅适用于5 个文档。假设,如果我们要做这样的事情:


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, {
  $project: {
    _id: 0,
    name: 1
  }
}, {
  $limit: 5
}])

This gets the matchingdocuments and projects those large number of documents and finally limits to five. So, projection is working on large number of documents and finally limiting to 5. This gives us a lesson that we should limit the documents to those which are absolutely necessaryto be passed to the next stage. Now, let's look at sortstage:

这将获取匹配的文档并投影那些大量文档,最终限制为5 个。因此,投影正在处理大量文档,最终限制为5。这给了我们一个教训,我们应该将文件限制在那些绝对需要传递到下一阶段的文件中。现在,让我们看看sort阶段:


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, {
  $sort: {
    name: 1
  }
}, {
  $limit: 5
}, {
  $project: {
    _id: 0,
    name: 1
  }
}])

This will sort all documents by name and give only 5out of them. Assume, if we were to do something like this:

这将按名称对所有文档进行排序,并且只给出其中的5个。假设,如果我们要做这样的事情:


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, {
  $limit: 5
}, {
  $sort: {
    name: 1
  }
}, {
  $project: {
    _id: 0,
    name: 1
  }
}])

This will take first 5documents and sort them. Let's add the skipstage:

这将取前5 个文档并对它们进行排序。让我们添加skip舞台:


db.companies.aggregate([{
  $match: {
    founded_year: 2004
  }
}, {
  $sort: {
    name: 1
  }
}, {
  $skip: 10
}, {
  $limit: 5
}, {
  $project: {
    _id: 0,
    name: 1
  }
}, ])

This will sort allthe documents and skip the initial 10documents and return to us. We should try to include $matchstages as early as possible in the pipeline. To filter documents using a $matchstage, we use the same syntax for constructing query documents (filters) as we do for find().

这将对所有文档进行排序并跳过最初的10 个文档并返回给我们。我们应该尽可能$match早地在管道中包含阶段。为了使用$match阶段过滤文档,我们使用与构建查询文档(过滤器)相同的语法find()

回答by cirrus

Try this;

尝试这个;

db.createCollection("so");
db.so.remove();
db.so.insert([
{
    post_body: 'This is the body of test post 1',
    created_date: ISODate('2012-09-29T05:23:41Z'),
    comments: 48
},
{
    post_body: 'This is the body of test post 2',
    created_date: ISODate('2012-09-24T12:34:13Z'),
    comments: 10
},
{
    post_body: 'This is the body of test post 3',
    created_date: ISODate('2012-08-16T12:34:13Z'),
    comments: 10
}
]);
//db.so.find();

db.so.ensureIndex({"created_date":1});
db.runCommand({
    aggregate:"so",
    pipeline:[
        {
            $match: { // filter only those posts in september
                created_date: { $gte: ISODate('2012-09-01'), $lt: ISODate('2012-10-01') }
            }
        },
        {
            $group: {
                _id: null, // no shared key
                comments: { $sum: "$comments" } // total comments for all the posts in the pipeline
            }
        },
]
//,explain:true
});

Result is;

结果是;

{ "result" : [ { "_id" : null, "comments" : 58 } ], "ok" : 1 }

So you could also modify your previous example to do this, although I'm not sure why you'd want to, unless you plan on doing something else with month and year in the pipeline;

因此,您也可以修改之前的示例来执行此操作,尽管我不确定您为什么要这样做,除非您计划在管道中以月份和年份做其他事情;

{
    aggregate: 'posts',
    pipeline: [
     {$match: { created_date: { $gte: ISODate('2012-09-01'), $lt: ISODate('2012-10-01') } } },
     {$project:
          {
               month : {$month:'$created_date'},
               year : {$year:'$created_date'}
          }
     },
     {$match:
          {
               month:9,
               year: 2012
           }
     },
     {$group:
         {_id: '0',
          totalComments:{$sum:'$comments'}
         }
      }
    ]
 }