mongodb 按多个字段对值进行分组

Question

提问by fervid

For example, I have these documents:

例如，我有这些文件：

{
  "addr": "address1",
  "book": "book1"
},
{
  "addr": "address2",
  "book": "book1"
},
{
  "addr": "address1",
  "book": "book5"
},
{
  "addr": "address3",
  "book": "book9"
},
{
  "addr": "address2",
  "book": "book5"
},
{
  "addr": "address2",
  "book": "book1"
},
{
  "addr": "address1",
  "book": "book1"
},
{
  "addr": "address15",
  "book": "book1"
},
{
  "addr": "address9",
  "book": "book99"
},
{
  "addr": "address90",
  "book": "book33"
},
{
  "addr": "address4",
  "book": "book3"
},
{
  "addr": "address5",
  "book": "book1"
},
{
  "addr": "address77",
  "book": "book11"
},
{
  "addr": "address1",
  "book": "book1"
}

Answer 1

回答by Neil Lunn

TLDR Summary

TLDR 摘要

In modern MongoDB releases you can brute force this with $slicejust off the basic aggregation result. For "large" results, run parallel queries instead for each grouping ( a demonstration listing is at the end of the answer ), or wait for SERVER-9377to resolve, which would allow a "limit" to the number of items to $pushto an array.

在现代 MongoDB 版本中，您可以$slice仅使用基本聚合结果来强制执行此操作。对于“大”的结果，对每个分组运行并行查询，而不是（示范上市是在回答结束时），或等待SERVER-9377来解决，这将允许“限制”的项目的数量$push至大批。

db.books.aggregate([
    { "$group": {
        "_id": {
            "addr": "$addr",
            "book": "$book"
        },
        "bookCount": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.addr",
        "books": { 
            "$push": { 
                "book": "$_id.book",
                "count": "$bookCount"
            },
        },
        "count": { "$sum": "$bookCount" }
    }},
    { "$sort": { "count": -1 } },
    { "$limit": 2 },
    { "$project": {
        "books": { "$slice": [ "$books", 2 ] },
        "count": 1
    }}
])

MongoDB 3.6 Preview

MongoDB 3.6 预览版

Still not resolving SERVER-9377, but in this release $lookupallows a new "non-correlated" option which takes an "pipeline"expression as an argument instead of the "localFields"and "foreignFields"options. This then allows a "self-join" with another pipeline expression, in which we can apply $limitin order to return the "top-n" results.

仍然没有解决SERVER-9377，但在此版本中$lookup允许一个新的“非相关”选项，它将"pipeline"表达式作为参数而不是"localFields"and"foreignFields"选项。然后这允许与另一个管道表达式的“自连接”，我们可以在其中应用$limit以返回“top-n”结果。

db.books.aggregate([
  { "$group": {
    "_id": "$addr",
    "count": { "$sum": 1 }
  }},
  { "$sort": { "count": -1 } },
  { "$limit": 2 },
  { "$lookup": {
    "from": "books",
    "let": {
      "addr": "$_id"
    },
    "pipeline": [
      { "$match": { 
        "$expr": { "$eq": [ "$addr", "$$addr"] }
      }},
      { "$group": {
        "_id": "$book",
        "count": { "$sum": 1 }
      }},
      { "$sort": { "count": -1  } },
      { "$limit": 2 }
    ],
    "as": "books"
  }}
])

The other addition here is of course the ability to interpolate the variable through $exprusing $matchto select the matching items in the "join", but the general premise is a "pipeline within a pipeline" where the inner content can be filtered by matches from the parent. Since they are both "pipelines" themselves we can $limiteach result separately.

这里的另一个添加当然是通过$expr使用$match选择“连接”中的匹配项来插入变量的能力，但一般前提是“管道内的管道”，其中可以通过来自父级的匹配来过滤内部内容. 由于它们本身都是“管道”，因此我们可以$limit分别得出各自的结果。

This would be the next best option to running parallel queries, and actually would be better if the $matchwere allowed and able to use an index in the "sub-pipeline" processing. So which is does not use the "limit to $push" as the referenced issue asks, it actually delivers something that should work better.

这将是运行并行查询的下一个最佳选择，如果$match允许并能够在“子管道”处理中使用索引，实际上会更好。因此，$push正如所引用的问题所要求的那样，它没有使用“限制到”，它实际上提供了应该更好地工作的东西。

Original Content

原创内容

You seem have stumbled upon the top "N" problem. In a way your problem is fairly easy to solve though not with the exact limiting that you ask for:

您似乎偶然发现了最重要的“N”问题。在某种程度上，您的问题很容易解决，尽管没有您要求的确切限制：

db.books.aggregate([
    { "$group": {
        "_id": {
            "addr": "$addr",
            "book": "$book"
        },
        "bookCount": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.addr",
        "books": { 
            "$push": { 
                "book": "$_id.book",
                "count": "$bookCount"
            },
        },
        "count": { "$sum": "$bookCount" }
    }},
    { "$sort": { "count": -1 } },
    { "$limit": 2 }
])

Now that will give you a result like this:

现在这会给你一个这样的结果：

{
    "result" : [
            {
                    "_id" : "address1",
                    "books" : [
                            {
                                    "book" : "book4",
                                    "count" : 1
                            },
                            {
                                    "book" : "book5",
                                    "count" : 1
                            },
                            {
                                    "book" : "book1",
                                    "count" : 3
                            }
                    ],
                    "count" : 5
            },
            {
                    "_id" : "address2",
                    "books" : [
                            {
                                    "book" : "book5",
                                    "count" : 1
                            },
                            {
                                    "book" : "book1",
                                    "count" : 2
                            }
                    ],
                    "count" : 3
            }
    ],
    "ok" : 1
}

So this differs from what you are asking in that, while we do get the top results for the address values the underlying "books" selection is not limited to only a required amount of results.

因此，这与您所要求的不同，虽然我们确实获得了地址值的最高结果，但基础“书籍”选择不仅限于所需的结果数量。

This turns out to be very difficult to do, but it can be done though the complexity just increases with the number of items you need to match. To keep it simple we can keep this at 2 matches at most:

事实证明这很难做到，但可以做到，尽管复杂性会随着您需要匹配的项目数量而增加。为简单起见，我们最多可以将其保留为 2 场比赛：

db.books.aggregate([
    { "$group": {
        "_id": {
            "addr": "$addr",
            "book": "$book"
        },
        "bookCount": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.addr",
        "books": { 
            "$push": { 
                "book": "$_id.book",
                "count": "$bookCount"
            },
        },
        "count": { "$sum": "$bookCount" }
    }},
    { "$sort": { "count": -1 } },
    { "$limit": 2 },
    { "$unwind": "$books" },
    { "$sort": { "count": 1, "books.count": -1 } },
    { "$group": {
        "_id": "$_id",
        "books": { "$push": "$books" },
        "count": { "$first": "$count" }
    }},
    { "$project": {
        "_id": {
            "_id": "$_id",
            "books": "$books",
            "count": "$count"
        },
        "newBooks": "$books"
    }},
    { "$unwind": "$newBooks" },
    { "$group": {
      "_id": "$_id",
      "num1": { "$first": "$newBooks" }
    }},
    { "$project": {
        "_id": "$_id",
        "newBooks": "$_id.books",
        "num1": 1
    }},
    { "$unwind": "$newBooks" },
    { "$project": {
        "_id": "$_id",
        "num1": 1,
        "newBooks": 1,
        "seen": { "$eq": [
            "$num1",
            "$newBooks"
        ]}
    }},
    { "$match": { "seen": false } },
    { "$group":{
        "_id": "$_id._id",
        "num1": { "$first": "$num1" },
        "num2": { "$first": "$newBooks" },
        "count": { "$first": "$_id.count" }
    }},
    { "$project": {
        "num1": 1,
        "num2": 1,
        "count": 1,
        "type": { "$cond": [ 1, [true,false],0 ] }
    }},
    { "$unwind": "$type" },
    { "$project": {
        "books": { "$cond": [
            "$type",
            "$num1",
            "$num2"
        ]},
        "count": 1
    }},
    { "$group": {
        "_id": "$_id",
        "count": { "$first": "$count" },
        "books": { "$push": "$books" }
    }},
    { "$sort": { "count": -1 } }
])

So that will actually give you the top 2 "books" from the top two "address" entries.

因此，这实际上会为您提供前两个“地址”条目中的前 2 个“书籍”。

But for my money, stay with the first form and then simply "slice" the elements of the array that are returned to take the first "N" elements.

但是为了我的钱，保持第一种形式，然后简单地“切片”返回的数组元素以获取前“N”个元素。

Demonstration Code

演示代码

The demonstration code is appropriate for usage with current LTS versions of NodeJS from v8.x and v10.x releases. That's mostly for the async/awaitsyntax, but there is nothing really within the general flow that has any such restriction, and adapts with little alteration to plain promises or even back to plain callback implementation.

演示代码适用于 v8.x 和 v10.x 版本的当前 LTS 版本的 NodeJS。这主要是针对async/await语法的，但在一般流程中没有任何真正有任何此类限制的内容，并且几乎不改变普通承诺甚至回到普通回调实现。

index.js

索引.js

const { MongoClient } = require('mongodb');
const fs = require('mz/fs');

const uri = 'mongodb://localhost:27017';

const log = data => console.log(JSON.stringify(data, undefined, 2));

(async function() {

  try {
    const client = await MongoClient.connect(uri);

    const db = client.db('bookDemo');
    const books = db.collection('books');

    let { version } = await db.command({ buildInfo: 1 });
    version = parseFloat(version.match(new RegExp(/(?:(?!-).)*/))[0]);

    // Clear and load books
    await books.deleteMany({});

    await books.insertMany(
      (await fs.readFile('books.json'))
        .toString()
        .replace(/\n$/,"")
        .split("\n")
        .map(JSON.parse)
    );

    if ( version >= 3.6 ) {

    // Non-correlated pipeline with limits
      let result = await books.aggregate([
        { "$group": {
          "_id": "$addr",
          "count": { "$sum": 1 }
        }},
        { "$sort": { "count": -1 } },
        { "$limit": 2 },
        { "$lookup": {
          "from": "books",
          "as": "books",
          "let": { "addr": "$_id" },
          "pipeline": [
            { "$match": {
              "$expr": { "$eq": [ "$addr", "$$addr" ] }
            }},
            { "$group": {
              "_id": "$book",
              "count": { "$sum": 1 },
            }},
            { "$sort": { "count": -1 } },
            { "$limit": 2 }
          ]
        }}
      ]).toArray();

      log({ result });
    }

    // Serial result procesing with parallel fetch

    // First get top addr items
    let topaddr = await books.aggregate([
      { "$group": {
        "_id": "$addr",
        "count": { "$sum": 1 }
      }},
      { "$sort": { "count": -1 } },
      { "$limit": 2 }
    ]).toArray();

    // Run parallel top books for each addr
    let topbooks = await Promise.all(
      topaddr.map(({ _id: addr }) =>
        books.aggregate([
          { "$match": { addr } },
          { "$group": {
            "_id": "$book",
            "count": { "$sum": 1 }
          }},
          { "$sort": { "count": -1 } },
          { "$limit": 2 }
        ]).toArray()
      )
    );

    // Merge output
    topaddr = topaddr.map((d,i) => ({ ...d, books: topbooks[i] }));
    log({ topaddr });

    client.close();

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }

})()

books.json

书籍.json

{ "addr": "address1",  "book": "book1"  }
{ "addr": "address2",  "book": "book1"  }
{ "addr": "address1",  "book": "book5"  }
{ "addr": "address3",  "book": "book9"  }
{ "addr": "address2",  "book": "book5"  }
{ "addr": "address2",  "book": "book1"  }
{ "addr": "address1",  "book": "book1"  }
{ "addr": "address15", "book": "book1"  }
{ "addr": "address9",  "book": "book99" }
{ "addr": "address90", "book": "book33" }
{ "addr": "address4",  "book": "book3"  }
{ "addr": "address5",  "book": "book1"  }
{ "addr": "address77", "book": "book11" }
{ "addr": "address1",  "book": "book1"  }

Answer 2

回答by Sikorski

Using aggregate function like below :

使用如下聚合函数：

[
{$group: {_id : {book : '$book',address:'$addr'}, total:{$sum :1}}},
{$project : {book : '$_id.book', address : '$_id.address', total : '$total', _id : 0}}
]

it will give you result like following :

它会给你如下结果：

        {
            "total" : 1,
            "book" : "book33",
            "address" : "address90"
        }, 
        {
            "total" : 1,
            "book" : "book5",
            "address" : "address1"
        }, 
        {
            "total" : 1,
            "book" : "book99",
            "address" : "address9"
        }, 
        {
            "total" : 1,
            "book" : "book1",
            "address" : "address5"
        }, 
        {
            "total" : 1,
            "book" : "book5",
            "address" : "address2"
        }, 
        {
            "total" : 1,
            "book" : "book3",
            "address" : "address4"
        }, 
        {
            "total" : 1,
            "book" : "book11",
            "address" : "address77"
        }, 
        {
            "total" : 1,
            "book" : "book9",
            "address" : "address3"
        }, 
        {
            "total" : 1,
            "book" : "book1",
            "address" : "address15"
        }, 
        {
            "total" : 2,
            "book" : "book1",
            "address" : "address2"
        }, 
        {
            "total" : 3,
            "book" : "book1",
            "address" : "address1"
        }

I didn't quite get your expected result format, so feel free to modify this to one you need.

我没有完全得到您预期的结果格式，因此请随时将其修改为您需要的格式。

Answer 3

回答by Jitendra

Below query will provide exactly the same result as given in the desired response:

以下查询将提供与所需响应中给出的结果完全相同的结果：

db.books.aggregate([
    {
        $group: {
            _id: { addresses: "$addr", books: "$book" },
            num: { $sum :1 }
        }
    },
    {
        $group: {
            _id: "$_id.addresses",
            bookCounts: { $push: { bookName: "$_id.books",count: "$num" } }
        }
    },
    {
        $project: {
            _id: 1,
            bookCounts:1,
            "totalBookAtAddress": {
                "$sum": "$bookCounts.count"
            }
        }
    }

])

The response will be looking like below:

响应将如下所示：

/* 1 */
{
    "_id" : "address4",
    "bookCounts" : [
        {
            "bookName" : "book3",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 2 */
{
    "_id" : "address90",
    "bookCounts" : [
        {
            "bookName" : "book33",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 3 */
{
    "_id" : "address15",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 4 */
{
    "_id" : "address3",
    "bookCounts" : [
        {
            "bookName" : "book9",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 5 */
{
    "_id" : "address5",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 6 */
{
    "_id" : "address1",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 3
        },
        {
            "bookName" : "book5",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 4
},

/* 7 */
{
    "_id" : "address2",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 2
        },
        {
            "bookName" : "book5",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 3
},

/* 8 */
{
    "_id" : "address77",
    "bookCounts" : [
        {
            "bookName" : "book11",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 9 */
{
    "_id" : "address9",
    "bookCounts" : [
        {
            "bookName" : "book99",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
}

mongodb 按多个字段对值进行分组

提问by fervid

回答by Neil Lunn

TLDR Summary

TLDR 摘要

MongoDB 3.6 Preview

MongoDB 3.6 预览版

Original Content

原创内容

Demonstration Code

演示代码

回答by Sikorski

回答by Jitendra

相关推荐

最近更新

标签

mongodb 按多个字段对值进行分组

提问by fervid

回答by Neil Lunn

TLDR Summary

TLDR 摘要

MongoDB 3.6 Preview

MongoDB 3.6 预览版

Original Content

原创内容

Demonstration Code

演示代码

回答by Sikorski

回答by Jitendra

相关推荐

mongodb 如何将查询结果（单个文档）存储到变量中？

将 mongodb 数据库从本地主机迁移到远程服务器

mongodb 如果存在，如何更新否则插入新文档？

MongoDB：计算数组中的项目数

相关推荐

最近更新

标签