mongodb 在同一个mongodb查询中选择按计数分组和不同计数

Question

提问by Rams

I am trying to do something like

我正在尝试做类似的事情

select campaign_id,campaign_name,count(subscriber_id),count(distinct subscriber_id)
group by campaign_id,campaign_name from campaigns;

This query giving results except count(distinct subscriber_id)

此查询给出除 count(distinctsubscriber_id) 之外的结果

db.campaigns.aggregate([
    {$match: {subscriber_id: {$ne: null}}},
    {$group: { 
        _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"},
        count: {$sum: 1}
    }}
])

This following query giving results except count(subscriber_id)

以下查询给出除 count(subscriber_id) 之外的结果

db.campaigns_logs.aggregate([
    {$match : {subscriber_id: {$ne: null}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name",subscriber_id: "$subscriber_id"}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"}, 
                count: {$sum: 1}
              }}
])

but I want count(subscriber_id),count(distinct subscriber_id) in the same result

但我想要 count(subscriber_id),count(distinctsubscriber_id) 在相同的结果

Answer 1

回答by Neil Lunn

You are beginning to think along the right lines here as you were headed in the right direction. Changing your SQL mindset, "distinct" is really just another way of writing a $groupoperation in either language. That means you have twogroup operations happening here and, in aggregation pipeline terms, two pipeline stages.

当您朝着正确的方向前进时，您开始沿着正确的路线思考。改变您的 SQL 思维方式，“distinct”实际上只是$group用任何一种语言编写操作的另一种方式。这意味着这里有两个组操作，在聚合管道术语中，有两个管道阶段。

Just with simplified documents to visualize:

只需使用简化的文档进行可视化：

{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "456"
}

It stands to reason that for the given "campaign" combination the total count and "distinct" count are "3" and "2" respectively. So the logical thing to do is "group" up all of those "subscriber_id" values first and keep the count of occurrences for each, then while thinking "pipeline", "total" those counts per "campaign" and then just count the "distinct" occurrences as a separate number:

按理说，对于给定的“活动”组合，总计数和“不同”计数分别为“3”和“2”。因此，合乎逻辑的做法是首先将所有这些“subscriber_id”值“分组”起来并保留每个值的出现次数，然后在考虑“管道”时，“总计”每个“广告系列”的这些计数，然后只计算“不同的”出现作为一个单独的数字：

db.campaigns.aggregate([
    { "$match": { "subscriber_id": { "$ne": null }}},

    // Count all occurrences
    { "$group": {
        "_id": {
            "campaign_id": "$campaign_id",
            "campaign_name": "$campaign_name",
            "subscriber_id": "$subscriber_id"
        },
        "count": { "$sum": 1 }
    }},

    // Sum all occurrences and count distinct
    { "$group": {
        "_id": {
            "campaign_id": "$_id.campaign_id",
            "campaign_name": "$_id.campaign_name"
        },
        "totalCount": { "$sum": "$count" },
        "distinctCount": { "$sum": 1 }
    }}
])

After the first "group" the output documents can be visualized like this:

在第一个“组”之后，输出文档可以像这样可视化：

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "456"
    }, 
    "count" : 1 
}
{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "123"
    }, 
    "count" : 2
}

So from the "three" documents in the sample, "2" belong to one distinct value and "1" to another. This can still be totaled with $sumin order to get the total matching documents which you do in the following stage, with the final result:

因此，从样本中的“三个”文档来看，“2”属于一个不同的值，而“1”属于另一个。这仍然可以总计，$sum以获得您在以下阶段所做的总匹配文档，以及最终结果：

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A"
    },
    "totalCount" : 3,
    "distinctCount" : 2
}

A really good analogy for the aggregation pipeline is the unix pipe "|" operator, which allows "chaining" of operations so you can pass the output of one command through to the input of the next, and so on. Starting to think of your processing requirements in that way will help you understand operations with the aggregation pipeline better.

聚合管道的一个很好的类比是 unix 管道“|” 运算符，它允许“链接”操作，以便您可以将一个命令的输出传递到下一个命令的输入，依此类推。以这种方式开始考虑您的处理要求将帮助您更好地理解聚合管道的操作。

Answer 2

回答by Surendranath Reddy K

SQL Query: (group by & count of distinct)

SQL 查询：（分组依据和不同计数）

select city,count(distinct(emailId)) from TransactionDetails group by city;

The equivalent mongo query would look like this:

等效的 mongo 查询如下所示：

db.TransactionDetails.aggregate([ 
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} } 
]);

mongodb 在同一个mongodb查询中选择按计数分组和不同计数

提问by Rams

回答by Neil Lunn

回答by Surendranath Reddy K

相关推荐

最近更新

标签

mongodb 在同一个mongodb查询中选择按计数分组和不同计数

提问by Rams

回答by Neil Lunn

回答by Surendranath Reddy K

相关推荐

mongodb “db.createUser 不是函数”和“密码不能为空”

mongodb 使用 OrderedDict 时“管道阶段规范对象必须包含一个字段”

MongoDB SELECT COUNT GROUP BY

MongoDB 和复合主键

相关推荐

最近更新

标签