Figure Reuse Detection

Published on August 24, 2024

Different figure types appearing in scientific articles

Imagetwin now detects figure reuse across articles for all figure types, including diagrams, flowcharts, box plots, photographs, and more. While our initial focus was on life science images such as FACS plots, microscopy images, or Western blots, this update expands Imagetwin’s capabilities to cover all scientific disciplines. Whether in the humanities, computer science, mathematics, or physics, Imagetwin can now effectively identify problematic manuscripts containing plagiarized figures or other inappropriate scientific behaviour.

Figure extraction and reuse detection

The system automatically extracts figures from manuscripts and compares them against our image database to detect reuse across articles. If a figure or substantial portions of it have been reused in another publication, Imagetwin flags the duplicate. These flagged duplicates are displayed in our web application, with detailed information about the source paper, such as DOI, authors, and publication year. This enables users to quickly determine whether the duplication represents acceptable reuse or potential scientific misconduct.

For figure reuse detection, we have indexed 25 million figures from previously published articles in our image database. We will continuously add new figures in future updates.

Evaluation

We evaluated the figure reuse detection algorithm by measuring its accuracy and false positive rate. We used 216 figures extracted from 50 randomly selected papers. We applied various alterations to the figures, including cropping, image quality/resolution changes, JPEG compression, and adjustments to brightness and contrast. We use the altered images to measure how effective the algorithm is in detecting the figures’ original sources. The algorithm successfully identified 197 of the 216 figures, achieving a 91% accuracy rate. Cases, where duplication was not detected (false negatives), involved small figures without distinctive features, making unique identification across millions of images difficult. We identified 3 false positives among the 216 evaluated figures, resulting in a false positive rate of about 1%. The false positives involved maps and highly similar (but not identical) bar plots. The evaluation results demonstrate the robustness of our algorithm in accurately detecting figure reuse while maintaining a minimal false positive rate.

Metric Result
Accuracy 197 of 216 figure duplicates detected (91% accuracy)
False Positive Rate 3 false positives in 216 checked figures (1% false positive rate)

Evaluation results for figure reuse detection applied on 216 figures from 50 papers