每日分组内的 MongoDB 聚合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15938859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 13:08:52  来源:igfitidea点击:

MongoDB aggregate within daily grouping

mongodbmongodb-queryaggregation-framework

提问by Kevin

I have some docs in mongo that looks something like this:

我在 mongo 中有一些文档,看起来像这样:

{
  _id : ObjectId("..."),
  "make" : "Nissan",
  ..
},
{
  _id : ObjectId("..."),
  "make" : "Nissan",
  "saleDate" :  ISODate("2013-04-10T12:39:50.676Z"),
  ..
}

Ideally, I'd like to be able to count, by make, the number of vehicles sold per day. I'd then like to view either today, or a window such as today through the last seven days.

理想情况下,我希望能够按品牌计算每天售出的车辆数量。然后我想查看今天或过去 7 天的窗口,例如今天。

I was able to accomplish the daily view with some ugly code

我能够用一些丑陋的代码完成日常视图

db.inventory.aggregate(
  { $match : { "saleDate" : { $gte: ISODate("2013-04-10T00:00:00.000Z"), $lt: ISODate("2013-04-11T00:00:00.000Z")  } } } ,
  { $group : { _id : { make : "$make", saleDayOfMonth : { $dayOfMonth : "$saleDate" } }, cnt : { $sum : 1 } } }
)

Which then yields the results

然后产生结果

{
  "result" : [
    {
      "_id" : {
        "make" : "Nissan",
        "saleDayOfMonth" : 10
      },
      "cnt" : 2
    },
    {
      "_id" : {
        "make" : "Toyota",
        "saleDayOfMonth" : 10
      },
      "cnt" : 4
    },
  ],
  "ok" : 1
}

So that is ok, but I would much prefer to not have to change the two datetime values in the query. Then, as I mentioned above, I'd like to be able to run this query (again, without having to modify it each time) and see the same results binned by day over the last week.

所以没关系,但我更愿意不必更改查询中的两个日期时间值。然后,正如我上面提到的,我希望能够运行此查询(同样,不必每次都修改它)并查看上周按天划分的相同结果。

Oh and here is the sample data I've been using for the query

哦,这是我一直用于查询的示例数据

db.inventory.save({"make" : "Nissan","saleDate" :  ISODate("2013-04-10T12:39:50.676Z")});
db.inventory.save({"make" : "Nissan"});
db.inventory.save({"make" : "Nissan","saleDate" :  ISODate("2013-04-10T11:39:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-09T11:39:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:38:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:37:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:36:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:35:50.676Z")});

Thanks in advance, Kevin

提前致谢,凯文

回答by ephigenia

In Mongo 2.8 RC2 there is a new data aggregation operator: $dateToStringwhich can be used to group by a day and simply have a "YYYY-MM-DD" in the result:

在 Mongo 2.8 RC2 中,有一个新的数据聚合运算符:$dateToString,可用于按天分组,结果中只需包含“YYYY-MM-DD”:

Example from the documentation:

文档中的示例:

db.sales.aggregate(
  [
     {
         $project: {
                yearMonthDay: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
                time: { $dateToString: { format: "%H:%M:%S:%L", date: "$date" } }
         }
     }
  ]
)

will result in:

将导致:

{ "_id" : 1, "yearMonthDay" : "2014-01-01", "time" : "08:15:39:736" }

回答by Asya Kamsky

UPDATEThe updated answer is based on date features in 3.6 as well as showing how to include dates in the range which had no sales (which wasn't mentioned in any original answers including mine).

更新更新后的答案基于 3.6 中的日期特征,并展示了如何在没有销售的范围内包含日期(包括我在内的任何原始答案中都没有提到)。

Sample data:

样本数据:

db.inventory.find()
{ "_id" : ObjectId("5aca30eefa1585de22d7095f"), "make" : "Nissan", "saleDate" : ISODate("2013-04-10T12:39:50.676Z") }
{ "_id" : ObjectId("5aca30eefa1585de22d70960"), "make" : "Nissan" }
{ "_id" : ObjectId("5aca30effa1585de22d70961"), "make" : "Nissan", "saleDate" : ISODate("2013-04-10T11:39:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70962"), "make" : "Toyota", "saleDate" : ISODate("2013-04-09T11:39:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70963"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:38:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70964"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:37:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70965"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:36:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70966"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:35:50.676Z") }
{ "_id" : ObjectId("5aca30f9fa1585de22d70967"), "make" : "Toyota", "saleDate" : ISODate("2013-04-11T11:35:50.676Z") }
{ "_id" : ObjectId("5aca30fffa1585de22d70968"), "make" : "Toyota", "saleDate" : ISODate("2013-04-13T11:35:50.676Z") }
{ "_id" : ObjectId("5aca3921fa1585de22d70969"), "make" : "Honda", "saleDate" : ISODate("2013-04-13T00:00:00Z") }

Defining startDateand endDateas variables and using them in aggregation:

定义startDateendDate作为变量并在聚合中使用它们:

startDate = ISODate("2013-04-08T00:00:00Z");
endDate = ISODate("2013-04-15T00:00:00Z");

db.inventory.aggregate([
  { $match : { "saleDate" : { $gte: startDate, $lt: endDate} } },
  {$addFields:{
     saleDate:{$dateFromParts:{
                  year:{$year:"$saleDate"},
                  month:{$month:"$saleDate"},
                  day:{$dayOfMonth:"$saleDate"}
     }},
     dateRange:{$map:{
        input:{$range:[0, {$subtract:[endDate,startDate]}, 1000*60*60*24]},
        in:{$add:[startDate, "$$this"]}
     }}
  }},
  {$unwind:"$dateRange"},
  {$group:{
     _id:"$dateRange", 
     sales:{$push:{$cond:[
                {$eq:["$dateRange","$saleDate"]},
                {make:"$make",count:1},
                {count:0}
     ]}}
  }},
  {$sort:{_id:1}},
  {$project:{
     _id:0,
     saleDate:"$_id",
     totalSold:{$sum:"$sales.count"},
     byBrand:{$arrayToObject:{$reduce:{
        input: {$filter:{input:"$sales",cond:"$$this.count"}},
        initialValue: {$map:{input:{$setUnion:["$sales.make"]}, in:{k:"$$this",v:0}}}, 
        in:{$let:{
           vars:{t:"$$this",v:"$$value"},
           in:{$map:{
              input:"$$v",
              in:{
                 k:"$$this.k",
                 v:{$cond:[
                     {$eq:["$$this.k","$$t.make"]},
                     {$add:["$$this.v","$$t.count"]},
                     "$$this.v"
                 ]}
              }
           }}
        }}
     }}}
  }}
])

On sample data this gives results:

在样本数据上,这给出了结果:

{ "saleDate" : ISODate("2013-04-08T00:00:00Z"), "totalSold" : 0, "byBrand" : {  } }
{ "saleDate" : ISODate("2013-04-09T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-10T00:00:00Z"), "totalSold" : 6, "byBrand" : { "Nissan" : 2, "Toyota" : 4 } }
{ "saleDate" : ISODate("2013-04-11T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-12T00:00:00Z"), "totalSold" : 0, "byBrand" : {  } }
{ "saleDate" : ISODate("2013-04-13T00:00:00Z"), "totalSold" : 2, "byBrand" : { "Honda" : 1, "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-14T00:00:00Z"), "totalSold" : 0, "byBrand" : {  } }

This aggregation can also be done with two $groupstages and a simple $projectinstead of $groupand a complex $project. Here it is:

这种聚合也可以通过两个$group阶段和一个简单的$project而不是$group和一个复杂的$project. 这里是:

db.inventory.aggregate([
   {$match : { "saleDate" : { $gte: startDate, $lt: endDate} } },
   {$addFields:{saleDate:{$dateFromParts:{year:{$year:"$saleDate"}, month:{$month:"$saleDate"}, day:{$dayOfMonth : "$saleDate" }}},dateRange:{$map:{input:{$range:[0, {$subtract:[endDate,startDate]}, 1000*60*60*24]},in:{$add:[startDate, "$$this"]}}}}},
   {$unwind:"$dateRange"},
   {$group:{
      _id:{date:"$dateRange",make:"$make"},
      count:{$sum:{$cond:[{$eq:["$dateRange","$saleDate"]},1,0]}}
   }},
   {$group:{
      _id:"$_id.date",
      total:{$sum:"$count"},
      byBrand:{$push:{k:"$_id.make",v:{$sum:"$count"}}}
   }},
   {$sort:{_id:1}},
   {$project:{
      _id:0,
      saleDate:"$_id",
      totalSold:"$total",
      byBrand:{$arrayToObject:{$filter:{input:"$byBrand",cond:"$$this.v"}}}
   }}
])

Same results:

相同的结果:

{ "saleDate" : ISODate("2013-04-08T00:00:00Z"), "totalSold" : 0, "byBrand" : { "Honda" : 0, "Toyota" : 0, "Nissan" : 0 } }
{ "saleDate" : ISODate("2013-04-09T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Honda" : 0, "Nissan" : 0, "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-10T00:00:00Z"), "totalSold" : 6, "byBrand" : { "Honda" : 0, "Toyota" : 4, "Nissan" : 2 } }
{ "saleDate" : ISODate("2013-04-11T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Toyota" : 1, "Honda" : 0, "Nissan" : 0 } }
{ "saleDate" : ISODate("2013-04-12T00:00:00Z"), "totalSold" : 0, "byBrand" : { "Toyota" : 0, "Nissan" : 0, "Honda" : 0 } }
{ "saleDate" : ISODate("2013-04-13T00:00:00Z"), "totalSold" : 2, "byBrand" : { "Honda" : 1, "Toyota" : 1, "Nissan" : 0 } }
{ "saleDate" : ISODate("2013-04-14T00:00:00Z"), "totalSold" : 0, "byBrand" : { "Toyota" : 0, "Honda" : 0, "Nissan" : 0 } }

Original Answer based on 2.6:

基于 2.6 的原始答案:

You might want to take a look at my blog entry about how to deal with various date manipulations in Aggregation Framework here.

您可能想查看我的博客条目,了解如何在此处处理聚合框架中的各种日期操作。

What you can do is use $projectphase to truncate your dates to daily resolution and then run the aggregation over the whole data set (or just part of it) and aggregate by date and make.

您可以做的是使用$projectphase 将日期截断为每日分辨率,然后对整个数据集(或只是其中的一部分)运行聚合并按日期聚合。

With your sample data, say you want to know how many vehicles you sold by make, by date this year:

使用您的样本数据,假设您想知道今年按日期按品牌销售了多少辆汽车:

match={"$match" : {
               "saleDate" : { "$gt" : new Date(2013,0,1) }
      }
};

proj1={"$project" : {
        "_id" : 0,
        "saleDate" : 1,
        "make" : 1,
        "h" : {
            "$hour" : "$saleDate"
        },
        "m" : {
            "$minute" : "$saleDate"
        },
        "s" : {
            "$second" : "$saleDate"
        },
        "ml" : {
            "$millisecond" : "$saleDate"
        }
    }
};

proj2={"$project" : {
        "_id" : 0,
        "make" : 1,
        "saleDate" : {
            "$subtract" : [
                "$saleDate",
                {
                    "$add" : [
                        "$ml",
                        {
                            "$multiply" : [
                                "$s",
                                1000
                            ]
                        },
                        {
                            "$multiply" : [
                                "$m",
                                60,
                                1000
                            ]
                        },
                        {
                            "$multiply" : [
                                "$h",
                                60,
                                60,
                                1000
                            ]
                        }
                    ]
                }
            ]
        }
    }
};

group={"$group" : {
        "_id" : {
            "m" : "$make",
            "d" : "$saleDate"
        },
        "count" : {
            "$sum" : 1
        }
    }
};

Now running the aggregation gives you:

现在运行聚合为您提供:

db.inventory.aggregate(match, proj1, proj2, group)
{
    "result" : [
        {
            "_id" : {
                "m" : "Toyota",
                "d" : ISODate("2013-04-10T00:00:00Z")
            },
            "count" : 4
        },
        {
            "_id" : {
                "m" : "Toyota",
                "d" : ISODate("2013-04-09T00:00:00Z")
            },
            "count" : 1
        },
        {
            "_id" : {
                "m" : "Nissan",
                "d" : ISODate("2013-04-10T00:00:00Z")
            },
            "count" : 2
        }
    ],
    "ok" : 1
}

You can add another {$project} phase to pretty up the output and you can add a {$sort} step, but basically for each date, for each make you get a count of how many were sold.

你可以添加另一个 {$project} 阶段来修饰输出,你可以添加一个 {$sort} 步骤,但基本上对于每个日期,对于每个日期,你都会得到销售数量的计数。

回答by egvo

I like user1083621's answer but that method causes some limitations in following operations with this field - because you can not use it as date field in (for instance) next aggregation pipeline stages. You can neither compare nor use any date aggregation operationsand after aggregation you'll have strings(!). All of that may be solved by projecting your original date field but in that case you'll get some difficulties with retaining it through groupping stage. And after all, sometimes you just want to manipulate with the beginning of day, not with arbitrary day time. So here's my method:

我喜欢user1083621的回答,但该方法会导致此字段的后续操作存在一些限制 - 因为您不能在(例如)下一个聚合管道阶段将其用作日期字段。您既不能比较也不能使用任何日期聚合操作,聚合后您将拥有字符串(!)。所有这些都可以通过投影原始日期字段来解决,但在这种情况下,通过分组阶段保留它会遇到一些困难。毕竟,有时您只想在一天的开始进行操作,而不是任意的一天时间。所以这是我的方法:

{'$project': {
    'start_of_day': {'$subtract': [
        '$date',
        {'$add': [
            {'$multiply': [{'$hour': '$date'}, 3600000]},
            {'$multiply': [{'$minute': '$date'}, 60000]},
            {'$multiply': [{'$second': '$date'}, 1000]},
            {'$millisecond': '$date'}
        ]}
    ]},
}}

It gives you this:

它给你这个:

{
    "start_of_day" : ISODate("2015-12-03T00:00:00.000Z")
},
{
    "start_of_day" : ISODate("2015-12-04T00:00:00.000Z")
}

Can't say if it any faster than user1083621's method.

不能说它是否比user1083621的方法更快。