Mongodb 聚合框架 | 分组多个值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11418985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:44:21  来源:igfitidea点击:

Mongodb Aggregation Framework | Group over multiple values?

mongodbaggregation-framework

提问by Oliver Lloyd

I would like to use mongoDB's Aggregation Framework to run what in SQL would look a bit like:

我想使用 mongoDB 的聚合框架来运行 SQL 中看起来有点像的内容:

SELECT SUM(A), B, C from myTable GROUP BY B, C;

The docs state:

文档状态:

You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

您可以从管道中的文档中指定单个字段、先前计算的值或由多个传入字段组成的聚合键。

But it's unclear what 'an aggregate key made from several incoming fields' actually is?

但目前还不清楚“由多个传入字段组成的聚合键”究竟是什么?

My dataset is a bit like this:

我的数据集有点像这样:

[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"},
{ "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"},
{ "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"},
{ "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"},
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"},
{ "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]

My query looks like this:

我的查询如下所示:

resultsCollection.aggregate(
    { $match : { testid : testid} },
    { $skip : alreadyRead },
    { $project : {
            timeStamp : 1 ,
            label : 1,
            responseCode : 1 ,
            value : 1,
            success : 1
        }},
    { $group : {
            _id : "$label",
            max_timeStamp : { $timeStamp : 1 },
            count_responseCode : { $sum : 1 },
            avg_value : { $sum : "$value" },
            count_success : { $sum : 1 }
        }},
    { $group : {
            ?
        }}
);

My instinct was to try to pipe the results through to a second group, I know you can do this but it won't work because the first group already reduces the dataset too much and the required level of detail is lost.

我的直觉是尝试将结果传递给第二组,我知道您可以这样做,但它不会起作用,因为第一组已经减少了太多数据集并且丢失了所需的细节级别。

What I want to do is group using label, responseCodeand successand get the sum of value from the result. It should look a bit like:

我想要做的是使用分组labelresponseCodesuccess从结果中获取值的总和。它应该看起来有点像:

label   | code | success | sum_of_values | count
sharon  | 200  |  true   |      10       |   1
sharon  | 200  |  false  |      35       |   1
paul    | 200  |  true   |      100      |   2
paul    | 404  |  true   |      15       |   1
paul    | 404  |  false  |      99       |   1

Where there are five groups:

其中有五个组:

1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}

2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}

3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}
   { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}

4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}

5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}

回答by Oliver Lloyd

OK, so the solution is to specify an aggregate key for the _id value. This is documented hereas:

好的,所以解决方案是为 _id 值指定一个聚合键。这在此处记录为:

You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

您可以从管道中的文档中指定单个字段、先前计算的值或由多个传入字段组成的聚合键。

But it doesn't actually define the format for an aggregate key. Reading the earlier documentation hereI saw that the previous collection.group method could take multiple fields and that the same structure is used in the new framework.

但它实际上并没有定义聚合键的格式。阅读此处的早期文档我看到之前的 collection.group 方法可以采用多个字段,并且在新框架中使用了相同的结构。

So, to group over multiple fields you could use _id : { success:'$success', responseCode:'$responseCode', label:'$label'}

因此,要对多个字段进行分组,您可以使用 _id : { success:'$success', responseCode:'$responseCode', label:'$label'}

As in:

如:

resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
        timeStamp : 1 ,
        label : 1,
        responseCode : 1 ,
        value : 1,
        success : 1
    }},
{ $group : {
        _id :  { success:'$success', responseCode:'$responseCode', label:'$label'},
        max_timeStamp : { $timeStamp : 1 },
        count_responseCode : { $sum : 1 },
        avg_value : { $sum : "$value" },
        count_success : { $sum : 1 }
    }}
);