Powerful Aggregation Framework of MongoDB

Introduction

As it’s known, MongoDB is a document-oriented database with lots of powerful features and advantages. One of those powerful features is the aggregation framework of MongoDB.

The main purpose of the aggregation framework is to process data records and return computed results. When using aggregation pipelines, it’s possible to group documents, perform a variety of arithmetical and other operations, perform left join within a database, merge collections, and a bunch of similar cool things.

The structure of the aggregation pipeline

For performing aggregations, it’s required to have a pipeline, which consists of many stages. Documents sequentially pass through the stages, transforming the document with each stage and passing it onto the next stage. Stages can contain various pipeline operators for working with arrays, strings, dates, etc.

When initiating an aggregation, it’s enough to run the following mongo shell command:

db.getCollection(collectionName).aggregate(pipeline)

where collectionName is the name of the collection on which needs to be run aggregation pipeline and pipeline is an aggregation pipeline(array of stages).

Left join using $lookup

$lookup stage lets “join” a collection by performing an equality match between fields of current and “joined” collections. This stage requires specifying some key options such as collection and field names.

In this case, we need to add a new books array to the documents of the current collection by selecting documents from the books collections based on the bookIds array.

where:

from : ” joined” collection name

localField : local field name which will be used for an equality match

foreignField : field name in “joined” collection which will be used for an equality match

as : output field name

Categorizing documents using $group and $unwind

There are lots of cases when we need to categorize documents based on some criteria and one of the ways to do it is by using $unwind and $group stages.

For example, let’s say that there are users who have specified the list of their favorite books. We need a pipeline that will create separate documents for each book where each document will contain a list of users who like that book.

Since user documents only contain object IDs of their favorite books, in the pipeline above we used $lookup stage to bring full objects of books into the books array.

By using $unwind stage we deconstruct user documents and reconstruct them by using $group stage. Inside $group stage we accumulate names of users into the users array.

Categorizing documents using $bucketAuto

Another stage that lets to categorize documents is $bucketAuto. It automatically defines buckets based on given parameters and group documents based on min/max ranges.

In this case, we want to categorize users based on their book count. As you can see, there are three buckets defined in the output section where each bucket has min and max ranges:

There are two stages used in the pipeline above, $addFields which adds book counts for each user document and $bucketAuto which groups documents by booksCount field.

In the $bucketAuto stage specified the following fields:

groupBy : the field name by which documents should be grouped (can be an expression as well)

buckets : maximum count of buckets

output : an output document specification

Multiple pipelines in a single stage using $facet

For this case, we want to combine multiple pipelines into a single stage. As an output of aggregation, we need to have two arrays of grouped users, one grouped by book count and another grouped by book IDs.

In the pipeline above, in $facet stage, we defined two pipelines named groupedByBooksCount and groupedByBooks. After executing, the output of each pipeline will be stored in the appropriate key.

Location queries using $geoNear

As it’s said in the documentation, $geoNear outputs documents in order of nearest to farthest from a specified point.

In this example, we have a collection called cities where each document contains a name and the coordinates of a specific city. One of the cities in that collection is Yerevan. We need to add a new field to all the documents that will tell the distance between Yerevan and the current city location.

In the pipeline above the coordinates array contains geographic coordinates of Yerevan. Value of distanceField is a field name that will contain distance between two cities and distanceMultiplier is for converting distance from meters to kilometers.

Note: To use $geoNear stage, on the coordinates field must be created 2d or 2dsphere index.

Recursive search using $graphLookup

Using $graphLookup it’s possible to perform a recursive search on a collection. It recursively populates all the connected documents based on specified fields.

In this example $graphLookup operation recursively matches on the reportsTo and name fields in the users collection, returning the reporting hierarchy for each user:

In the pipeline above there are three stages:

$graphLookup — recursively finds related documents and output them into reportingHierarchy array

$unwind — deconstruct added reportingHierarchy array

$group — reconstruct documents and overrides existing reportingHierarchy array leaving only user names

Storing pipeline output to a collection using $out

This stage takes the documents returned by the aggregation pipeline and writes them to a specified collection.

Note: The $out stage must be the last stage in the pipeline

In this specific example, we added $out stage at the end of the pipeline that will save the output of the last stage into distances collection in the same database.

Conclusion

In this article, we covered very few features of the aggregation framework of MongoDB. For deep learning, I suggest having a look at the official documentation and do some practice work using a test database.