Docs Menu
Docs Home
/
MongoDB Manual
/ / /

$rankFusion (aggregation)

On this page

  • Definition
  • Syntax
  • Behavior
  • Example
$rankFusion

$rankFusion first executes all input pipelines independently and then de-duplicates and combines the input pipeline results into a final ranked results set.

$rankFusion outputs a ranked set of documents based on the ranks the input documents appear in their input pipelines and the pipeline weights. This stage uses the Reciprocal Rank Fusion algorithm to rank the combined results of the input pipelines.

Use $rankFusion to search for documents in a single collection based on multiple criteria and retrieve a final ranked results set that factors in all specified criteria.

The stage has the following syntax:

{ $rankFusion: {
input: {
pipelines: {
<myPipeline1>: <expression>,
<myPipeline2>: <expression>,
...
}
},
combination: {
weights: {
<myPipeline1>: <numeric expression>,
<myPipeline2>: <numeric expression>,
...
}
},
scoreDetails: <bool>
} }

$rankFusion takes the following fields:

Field
Type
Description

input

Object

Defines the input that $rankFusion ranks.

input.pipelines

Object

Contains a map of pipeline names to the aggregation stages that define that pipeline. input.pipelines must contain at least one pipeline. All pipelines must operate on the same collection and must have a unique name.

For more information on input pipeline restrictions, see Input Pipelines and Input Pipeline Names.

combination

Object

Optional. Defines how to combine the input pipeline results.

combination.weights

Object

Optional. Contains a map from input pipeline names to their weights relative to other pipelines. Each weight value must be a non-negative number (whole, or decimal).

If you do not specify a weight, the default value is 1.

scoreDetails

Boolean

Default is false. Specifies if $rankFusion computes and populates the $scoreDetails metadata field for each output document. See scoreDetails for more information on this field.

You can only use $rankFusion with a single collection. You cannot use this aggregation stage at a database scope.

$rankFusion de-duplicates the results across multiple input pipelines in the final output. Each unique input document appears at most once in the $rankFusion output, regardless of the number of times that the document appears in input pipeline outputs.

Each input pipeline must be either a Selection Pipeline or a Ranked Pipeline.

A Selection Pipeline retrieves a set of documents from a collection without performing any modifications after retrieval. $rankFusion compares documents across different input pipelines which requires that all input pipelines output the same unmodified documents.

Note

If you want to modify the documents that you search for with $rankFusion, perform those modifications after the $rankFusion stage.

A selection pipeline must only contain the following stages:

Type
Stages

Search Stages

  • $match, including $match with legacy text search

  • $search

  • $vectorSearch

  • $sample

  • $geoNear

    Note

    If you use $geoNear in a selection pipeline, you cannot specify includeLogs or distanceField because those fields modify documents.

Ordering Stages

Pagination Stages

A ranked pipeline sorts or orders documents. $rankFusion uses the order of ranked pipeline results to influence the output ranking. Ranked pipelines must meet one of the following criteria:

Pipeline names in input must meet the following restrictions:

  • Must not be an empty string

  • Must not start with a $

  • Must not contain the ASCII null character delimiter \0 anywhere in the string

  • Must not contain a .

$rankFusion orders results according to the Reciprocal Rank Fusion (RRF) Formula. This stage places the RRF score for each document in the score metadata field of the output results. The RRF formula ranks documents with a combination of the following factors:

  • The placement of documents in input pipeline results

  • The number of time that a document appears in different input pipelines

  • The weights of input pipelines.

For example, if a document has a high ranking in multiple pipeline result sets, the RRF score for that document would be higher than if that same document has the same ranking in some input pipelines, but is not present (or has a lower ranking) in the other pipelines

The Reciprocal Rank Fusion (RRF) Formula is equivalent to the following algebraic operation:

The reciprocal rank fusion formula
click to enlarge

Note

In this formula, 60 is a sensitivity parameter that MongoDB determined.

The below table contains the variables that the RRF formula uses:

Variable
Description

D

The set of result documents for the whole operation

d

The document that the RRF score is being computed for

R

The set of ranks for input pipelines that d appears in

r(d)

The rank of document d in this input pipeline

w

The weight of the input pipeline that d appears in

Each term in the summation represents the appearance of a document d in one of the input pipelines. The total RRF score for d is the summation of each of these terms across all the input pipelines that d appears in.

Consider a $rankFusion pipeline stage with one $search and one $vectorSearch input pipeline.

All input pipelines output the same 3 documents: Document1, Document2, and Document3.

The $search pipeline ranks the documents in the following order:

  1. Document3

  2. Document2

  3. Document1

The $vectorSearch pipeline ranks the documents in the following order:

  1. Document1

  2. Document2

  3. Document3.

rankFusion computes the RRF score for Document1 through the following operation:

RRFscore(Document1) = 1/(60 + search_rank_of_Document1) + (1/(60 + vectorSearch_rank_of_Document1))
RRFscore(Document1) = 1/63 + 1/61
RRFscore(Document1) = 0.0322664585

The score metadata field for Document1 is 0.0322664585.

If you set scoreDetails to true, $rankFusion creates a scoreDetails metadata field for each document. The scoreDetails field contains information about the final ranking.

Note

When you set scoreDetails to true, $rankFusion sets the scoreDetails metadata field for each document but does not automatically output the scoreDetails metafield.

To view the scoreDetails metadata field, you must either:

  • use a $project stage after $rankFusion to project the scoreDetails field

  • use a $addFields stage after $rankFusion to add the scoreDetails field to your pipeline output

The scoreDetails field contains the following subfields:

Field
Description

value

The numerical value of the RRF score for this document

description

A description of how $rankFusion computed the RRF score

details

An array where each array entry contains information about the input pipelines that output this document

Each array entry in the details field contains the following subfields:

Field
Description

inputPipelineName

The name of the input pipeline that output this document

rank

The rank of this document in the input pipeline

weight

The weight of the input pipeline

value

Optional. If the input pipeline output a { $meta: 'score' } for this document, value contains { $meta: 'score' }.

description

Optional. If the input pipeline output a { $meta: 'score' } for this document, description contains information on how the pipeline calculated the score.

details

Optional. If the input pipeline output a details field, details contains that value.

For example, the following code blocks shows the scoreDetails field for a $rankFusion operation with $search, $vectorSearch, and $match input pipelines:

{
value: 0.030621785881252923,
description: "value output by recriprocal rank fusion algorithm, computed as sum of weight * (1 / (60 + rank)) across input pipelines from which this document is output, from:"
details: [
{
inputPipelineName: 'search',
rank: 2,
weight: 1,
value: 0.3876491287,
description: "sum of:",
details: [... omitted for brevity in this example ...]
},
{
inputPipelineName: 'vector',
rank: 9,
weight: 3,
value: 0.7793490886688232,
details: [ ]
},
{
inputPipelineName: 'match',
rank: 10,
weight: 1,
details: []
}
]
}

MongoDB converts $rankFusion operations into a set of existing aggregation stages that, in combination, compute the output result prior to query execution. The Explain Results for a $rankFusion operation show the full execution of the underlying aggregation stages that $rankFusion uses to compose the final result.

This example uses a rentals collection that contains data with the following format:

db.rentals.insertOne(
{
name: "Private Room in Bushwick",
summary:
"Here exists a very cozy room for rent in a shared 4-bedroom apartment. It is located one block off of the JMZ at Myrtle Broadway. The neighborhood is diverse and appeals to a variety of people.",
space: "",
description:
"Here exists a very cozy room for rent in a shared 4-bedroom apartment. It is located one block off of the JMZ at Myrtle Broadway. The neighborhood is diverse and appeals to a variety of people.",
neighborhood_overview: "",
number_of_reviews: 1,
address: {
street: "Brooklyn, NY, United States",
suburb: "Brooklyn",
government_area: "Bushwick",
market: "New York",
country: "United States",
country_code: "US",
location: {
type" "Point",
coordinates: [-73.93615, 40.69791],
is_location_exact: true
}
},
_id: 0,
review_score: 100
}
)

Create a search index on the rentals collection:

db.rentals.createSearchIndex(
"search_rental",
{
mappings: { dynamic: true }
}
)

The following aggregation pipeline uses $rankFusion with the following input pipelines:

Pipeline
Number of Documents Returned
Description

searchOne

20

Runs a text search for the term brooklyn on the name, summary, description, and neighborhood_overview fields

match

20

Finds documents with 25 or more reviews and sorts them from highest review_score to lowest

searchTwo

20

Runs a text search for the term kitchen on the space and description fields

db.rentals.aggregate( [
{
$rankFusion: {
input: {
pipelines: {
searchOne: [
{
$search: {
index: "search_rental",
text: {
query: "brooklyn",
path: [
"name",
"summary",
"description",
"neighborhood_overview",
],
},
}
},
{ $limit: 20 }
],
match: [
{
$match: {
number_of_reviews: {
$gte: 25,
},
}
},
{ $sort: {"review_score": -1} },
{ $limit: 20 }
],
searchTwo: [
{
$search: {
index: "search_rental",
text: {
query: "kitchen",
path: [ "space", "description" ],
},
}
},
{ $limit: 20 }
],
}
}
},
},
{ $limit: 20 }
] )

This operation performs the following actions:

  • Executes the input pipelines

  • Combines and ranks the returned results

  • Outputs the first 20 documents which are the top 20 ranked results of the $rankFusion pipeline

Back

Toggle Log Output