Duplicate Detection
Duplicate detection removes redundant records from your project before screening begins. When you import references from multiple databases, the same study often appears more than once - with slightly different titles, author orderings, or journal abbreviations. The Medusa algorithm identifies these matches so you can review and remove them.
How Medusa works
Medusa uses embedding-based similarity matching rather than simple text comparison. Each study record is converted into a vector embedding that captures its semantic meaning. Medusa then computes similarity scores between all pairs of records and groups them by confidence level:
- True duplicates - high-confidence matches where Medusa is certain the two records refer to the same study
- Possible duplicates - lower-confidence matches that share significant similarity but require a human decision
This approach catches duplicates even when the title or author list differs between database exports, because the underlying semantic content is the same.
Running deduplication
Deduplication runs after you upload your reference files and before screening starts. From the project dashboard, navigate to Deduplication and click Run Medusa. Processing time depends on the number of records in your project.
You can re-run deduplication after uploading additional reference files. Medusa processes all records in the project, including ones from previous uploads.
Duplicate counts
The deduplication dashboard shows three counts:
| Count | Meaning |
|---|---|
| True duplicates | Pairs Medusa classified as definite duplicates |
| Possible duplicates | Pairs requiring human review |
| Discarded | Pairs you have dismissed as not being duplicates |
After you have reviewed and acted on all pairs, the remaining unique records proceed to screening.
Guides in this section
True Duplicates
Review and confirm high-confidence duplicate pairs identified by Medusa.
Possible Duplicates
Manually review lower-confidence pairs and mark them as true duplicates or not duplicates.
Managing Discarded
View pairs you previously dismissed and undo decisions if needed.