Figure Reuse Detection

Imagetwin now detects figure reuse across articles for all figure types, including diagrams, flowcharts, box plots, photographs, and more. While our initial focus was on life science images such as FACS plots, microscopy images, or Western blots, this update expands Imagetwin’s capabilities to cover all scientific disciplines. Whether in the humanities, computer science, mathematics, or physics, Imagetwin can now effectively identify problematic manuscripts containing plagiarized figures or other inappropriate scientific behaviour.

Figure extraction and reuse detection

The system automatically extracts figures from manuscripts and compares them against our image database to detect reuse across articles. If a figure or substantial portions of it have been reused in another publication, Imagetwin flags the duplicate. These flagged duplicates are displayed in our web application, with detailed information about the source paper, such as DOI, authors, and publication year. This enables users to quickly determine whether the duplication represents acceptable reuse or potential scientific misconduct.

For figure reuse detection, we have indexed 25 million figures from previously published articles in our image database. We will continuously add new figures in future updates.

Detection accuracy

We evaluated the figure reuse detection algorithm by measuring its accuracy and false positive rate. We used 216 figures extracted from 50 randomly selected papers. We applied various alterations to the figures, including cropping, image quality/resolution changes, JPEG compression, and adjustments to brightness and contrast. We use the altered images to measure how effective the algorithm is in detecting the figures’ original sources. The algorithm successfully identified 197 of the 216 figures, achieving a 91% accuracy rate.

Cases, where duplication was not detected (false negatives), involved small figures without distinctive features, making unique identification across millions of images difficult. We identified 3 false positives among the 216 evaluated figures, resulting in a false positive rate of about 1%. The false positives involved maps and highly similar (but not identical) bar plots. The evaluation results demonstrate the robustness of our algorithm in accurately detecting figure reuse while maintaining a minimal false positive rate.

Metric Result
Accuracy 197 of 216 figure duplicates detected (91% accuracy)
False positive rate 3 false positives in 216 checked figures (1% false positive rate)
Evaluation results for figure reuse detection applied on 216 figures from 50 papers

Duplicate Detection in FACS Plots

We just released a new AI-based algorithm to effectively detect duplications in fluorescence-activated cell sorting (FACS) plots. FACS plots are a common target for inappropriate image duplication. Hundreds of posts regarding FACS plots have emerged on PubPeer. While some duplications are honest mistakes caused by sloppy researchers, others are more severe and are used to fabricate data deliberately. Imagetwin now detects duplicates in FACS plots accurately, helping to identify problematic manuscripts.

Detection within and across publications

We detect two categories of duplicates: those appearing within the scanned manuscript and those appearing across publications. After uploading a manuscript, panels showing FACS plots are automatically extracted and compared to identify duplications. In addition to finding duplicates within the uploaded manuscript, the panels are compared against our image database, which initially contains 2.6 million FACS plots from previously published articles. We will constantly add new FACS plots to the database in future updates. The check against our database helps identify the reuse of FACS plots across publications.

Detection accuracy

We evaluated the FACS detection algorithm to measure accuracy, false positives, and runtime. We applied our algorithm to 152 FACS duplicates posted on PubPeer and correctly identified the duplication in 121 of the 152 cases (80% accuracy).

Cases where we could not find the duplication (i.e., false negatives) were blurry, low-resolution images, or cases where only a tiny fraction of the entire image was duplicated. To measure the false positive rate, we applied the algorithm to 3,000 FACS plots randomly sampled from papers published from 2010 to 2023. Across these 3,000 images, we found 5 false positive duplicates caused by similar labelling (i.e., lines, arrows, text descriptions). In addition to false positives, we detected more than 50 suspicious, probably inappropriate duplicates.

The new algorithm to detect duplicates in FACS plots is efficient: we measured a runtime of 30 milliseconds per image, so for checking 100 FACS plots, the results will be ready in just 3 seconds.

Metric Result
Accuracy Detected 121 of 152 duplicates sampled from Pubpeer
False positives 5 false positives in 3,000 checked FACS plots
Runtime 30 milliseconds per FACS plot
Evaluation results for FACS duplicate detection