mongodb提供了更加高级的查询功能：聚合操作。聚合操作是指，对于输入的一批数据，进行中间的各种操作，比如过滤、分组、排序，然后再输出。通过聚合操作，主要是满足了用户对于数据统计和更高级查询的需求。
mongodb提供的聚合操作分为三种：聚合管道（aggregation pipeline）、map-reduce和单目聚合操作（single purpose aggregation operation）。下面分别进行介绍。

1 聚合管道框架

mongodb的聚合管道框架基于数据处理管道的概念。文档经过多个stage的管道来处理，然后得到一个聚合的结果。如下面的图所示。

聚合管道的操作符有如下这些：

操作符	描述
`$project`	数据投影，指定要处理的字段，可以用于重命名、添加或者删除字段
`$group`	根据指定的key进行分组
`$match`	根据条件匹配或者筛选文档
`$limit`	限制传递给下一个stage的文档数量
`$skip`	跳过前面的指定数量的文档
`$unwind`	用于将数组里的多个元素拆分成多个，每个元素都被放入一个单独的文档中
`$sort`	对文档进行排序
`$geoNear`	将文档按照从近到远的排序输出
`$out`	用于将流中的元素输出到某个集合中

下面看看这些操作符的使用。

`$match`

根据条件匹配或者筛选文档。
下面的例子过滤出页数小于900的书籍，只返回页数大于等于900的书籍。

> db.books.aggregate({
    "$match": {
        "pageCount": {
            "$gte": 900
        }
    }
})

`$project`

$project用于对文档的键进行选择或者重命名等操作。

下面的例子过滤文档之后，只选择_id、title等键返回。

> db.books.aggregate(
    {
        "$match": {
            "pageCount": {
                "$gte": 900
            }
        }
    },
    {
        "$project": {
            "_id": 1,
            "title": 1,
            "pageCount": 1,
            "authors": 1
        }
    }
)

键名后面的1表示返回该键，0表示不返回该键。

返回结果：

{
	"_id" : 70,
	"title" : "Essential Guide to Peoplesoft Development and Customization",
	"pageCount" : 1101,
	"authors" : [
		"Tony DeLia",
		"Galina Landres",
		"Isidor Rivera",
		"Prakash Sankaran"
	]
}
...

如果$project操作符里的键为一个不存在的键名，且后面紧跟一个$和一个存在的键名，则表示重命名并返回。如下面的例子所示。将title重命名为bookTitle。

> db.books.aggregate(
    {
        "$match": {
            "pageCount": {
                "$gte": 900
            }
        }
    },
    {
        "$project": {
            "_id": 1,
            "bookTitle": "$title",
            "pageCount": 1,
            "authors": 1
        }
    }
)

`$group`

$group表示根据指定的键对文档进行分组。

_id键表示用于分组的键，用于分组的键前面需要加上$。其他的键名可以自定义，比如下面代码中的groupSize

> db.books.aggregate(
    {
        "$match": {
            "pageCount": {
                "$gte": 800
            }
        }
    },
    {
        "$group": {
            "_id": "$pageCount",
            "groupSize": {
                "$sum":1
            }
        }
    }
)

返回的结果如下

{
	"_id" : 860,
	"groupSize" : 1
},
{
	"_id" : 848,
	"groupSize" : 3
}

$group可以与$sum、$min、$max、$avg等操作符一起使用，用于对分组里面的多个文档进行统计。
下面的例子，先根据category进行分组，然后统计每个分组下的书籍的价格总和。

> db.col_book.aggregate(
    {
        "$group": {
            "_id": "$category",
            "totalPrice": {
                "$sum": "$price"
            }
        }
    }
)

`$sort`、`$skip`和`$limit`

下面的代码，先按价格从高到低排序，然后跳过最高价，最后选择两个文档返回。

> db.col_book.aggregate(
    {
        "$sort": {
            "price": -1
        }
    },
    {
        "$skip": 1
    },
    {
        "$limit": 2
    }
)

`$unwind`

$unwind操作符用于将数组里的多个元素拆分成多个，每个元素都被放入一个单独的文档中。

假设集合中有文档如下

{
	"_id" : ObjectId("5dc17dd26d34c41de7dda410"),
	"title" : "Head First Java",
	"price" : 10,
	"category" : [
		"computer",
		"java"
	],
	"date" : ISODate("2019-11-10T00:00:00.000+08:00")
}

$unwind操作如下

> db.col_book.aggregate(
    {
        "$match": {
            "_id" : ObjectId("5dc17dd26d34c41de7dda410")
        }
    },
    {
        "$unwind": "$category"
    }
)

得到的结果如下

{
	"_id" : ObjectId("5dc17dd26d34c41de7dda410"),
	"title" : "Head First Java",
	"price" : 10,
	"category" : "computer",
	"date" : ISODate("2019-11-10T00:00:00.000+08:00")
},
{
	"_id" : ObjectId("5dc17dd26d34c41de7dda410"),
	"title" : "Head First Java",
	"price" : 10,
	"category" : "java",
	"date" : ISODate("2019-11-10T00:00:00.000+08:00")
}

可以看出，$unwind操作符的作用就像是将某个元素进行了“展开”。

`$out`

$out操作符用于将流中的元素输出到某个集合中。

下面的例子筛选出价格大于220的书籍，然后保存到集合tmpCollection中。之后，从集合tmpCollection读取文档。

> db.col_book.aggregate(
    {
        "$match": {
            "price" : {
                "$gt": 220
            }
        }
    },
    {
        "$out": "tmpCollection"
    }
)
> db.tmpCollection.find({})

注意：
1.因为结果要输出到集合中，因此，结果中的文档的_id不能重复，否则会抛出异常。
2.如果存在$out操作符，则其必须为管道中最后一个操作符。
3.会在相同的数据库下面新建一个指定名字的集合。

2 map-reduce

map-reduce是一个通用的数据处理模型，用于将大量数据处理为我们所需要的聚合之后的结果。mongodb提供mapReduce命令用于支持map-reduce操作。以下是来自mongodb官网的一个例子。

下面一段话是官网对于map-reduce流程的解释。写的很好，就不翻译了。

In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition). The map function emits key-value pairs. For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data. MongoDB then stores the results in a collection. Optionally, the output of the reduce function may pass through a finalize function to further condense or process the results of the aggregation.
All map-reduce functions in MongoDB are JavaScript and run within the mongod process. Map-reduce operations take the documents of a single collection as the input and can perform any arbitrary sorting and limiting before beginning the map stage. mapReduce can return the results of a map-reduce operation as a document, or may write the results to collections.

注意：与聚合管道相比，map-reduce操作会很灵活，用户可以实现自己想要的操作。缺点是性能没有聚合管道好，且没有方便使用的操作符。

3 单目聚合操作(Single Purpose Aggregation Operations)

mongodb提供了单目聚合操作，比如count()和distinct()，用于对某个集合做简单的聚合操作。
下图的示例中，distinct的作用是取出唯一的cust_id。distinct()的返回结果是一个数组。

单目聚合操作使用起来简单，但是，仅仅可以做一些简单的操作。相比聚合管道和map-reduce操作而言，缺乏灵活性，功能也不够强大。

4 总结

下表总结了三种类型的聚合操作的优缺点

项目	聚合管道	map-reduce	单目聚合操作
性能	较好	最差	好
灵活性	较灵活，功能较强大	最灵活，可以实现的功能最强大	不够灵活，可以提供的功能太弱
使用便利性	较便利，有直接使用的API	不够便利，需要使用js函数	便利，有直接使用的API

参考

MongoDB聚合
 Aggregation

mongodb学习系列4：聚合操作

mongodb,nosql,聚合

1 聚合管道框架

`$match`

`$project`

`$group`

`$sort`、`$skip`和`$limit`

`$unwind`

`$out`

2 map-reduce

3 单目聚合操作(Single Purpose Aggregation Operations)

4 总结

参考

CATALOG

FEATURED TAGS

FRIENDS

1 聚合管道框架

$match

$project

$group

$sort、$skip和$limit

$unwind

$out

2 map-reduce

3 单目聚合操作(Single Purpose Aggregation Operations)

4 总结

参考

CATALOG

FEATURED TAGS

FRIENDS

`$match`

`$project`

`$group`

`$sort`、`$skip`和`$limit`

`$unwind`

`$out`