Duplicate Detection

Duplicate detection removes redundant records from your project before screening begins. When you import references from multiple databases, the same study often appears more than once - with slightly different titles, author orderings, or journal abbreviations. The Medusa algorithm identifies these matches so you can review and remove them.

How Medusa works

Medusa uses embedding-based similarity matching rather than simple text comparison. Each study record is converted into a vector embedding that captures its semantic meaning. Medusa then computes similarity scores between all pairs of records and groups them by confidence level:

True duplicates - high-confidence matches where Medusa is certain the two records refer to the same study
Possible duplicates - lower-confidence matches that share significant similarity but require a human decision

This approach catches duplicates even when the title or author list differs between database exports, because the underlying semantic content is the same.

Screenshot needed

Duplicate detection dashboard showing counts for true duplicates, possible duplicates, and discarded pairs

Running deduplication

Deduplication runs after you upload your reference files and before screening starts. From the project dashboard, navigate to Deduplication and click Run Medusa. Processing time depends on the number of records in your project.

Info

You can re-run deduplication after uploading additional reference files. Medusa processes all records in the project, including ones from previous uploads.

Duplicate counts

The deduplication dashboard shows three counts:

Count	Meaning
True duplicates	Pairs Medusa classified as definite duplicates
Possible duplicates	Pairs requiring human review
Discarded	Pairs you have dismissed as not being duplicates

After you have reviewed and acted on all pairs, the remaining unique records proceed to screening.

Duplicate Detection

How Medusa works

Running deduplication

Duplicate counts

Guides in this section

True Duplicates

Possible Duplicates

Managing Discarded

Duplicate Detection

How Medusa works

Running deduplication

Duplicate counts

Guides in this section

True Duplicates

Possible Duplicates

Managing Discarded

Related