Duplicate Detection in FACS Plots

Published on August 3, 2024

We just released a new AI-based algorithm to effectively detect duplications in fluorescence-activated cell sorting (FACS) plots. FACS plots are a common target for inappropriate image duplication. Hundreds of posts regarding FACS plots have emerged on PubPeer. While some duplications are honest mistakes caused by sloppy researchers, others are more severe and are used to fabricate data deliberately. Imagetwin now detects duplicates in FACS plots accurately, helping to identify problematic manuscripts.

Detection within and across publications

We detect two categories of duplicates: those appearing within the scanned manuscript and those appearing across publications. After uploading a manuscript, panels showing FACS plots are automatically extracted and compared to identify duplications. In addition to finding duplicates within the uploaded manuscript, the panels are compared against our image database, which initially contains 2.6 million FACS plots from previously published articles. We will constantly add new FACS plots to the database in future updates. The check against our database helps identify the reuse of FACS plots across publications.

Evaluation

We evaluated the FACS detection algorithm to measure accuracy, false positives, and runtime. We applied our algorithm to 152 FACS duplicates posted on PubPeer and correctly identified the duplication in 121 of the 152 cases (80% accuracy). Cases where we could not find the duplication (i.e., false negatives) were blurry, low-resolution images, or cases where only a tiny fraction of the entire image was duplicated. To measure the false positive rate, we applied the algorithm to 3,000 FACS plots randomly sampled from papers published from 2010 to 2023. Across these 3,000 images, we found 5 false positive duplicates caused by similar labelling (i.e., lines, arrows, text descriptions). In addition to false positives, we detected more than 50 suspicious, probably inappropriate duplicates. The new algorithm to detect duplicates in FACS plots is efficient: we measured a runtime of 30 milliseconds per image, so for checking 100 FACS plots, the results will be ready in just 3 seconds.

Metric Result
Accuracy Detected 121 of 152 duplicates sampled from Pubpeer
False Positives 5 false positives in 3,000 checked FACS plots
Runtime 30 milliseconds per FACS plot

Evaluation results for FACS duplicate detection