Launching Our Enhanced AI-Generated Image Detection

Imagetwin’s AI-generated image detection is now out of beta. After months of testing, development and invaluable feedback from our users, we are releasing a production-ready version.

Detecting AI-generated content in scientific papers is one of the most technically challenging areas of research integrity. However, with this release, we are delivering significantly higher accuracy, broader model coverage, and much clearer transparency for anyone reviewing flagged images.

Why This Matters

Since our beta launch, many of you have shared your experiences using this tool in real research workflows. One of the most consistent themes in your feedback was the need for more explainability, not just knowing whether an image was likely AI-generated, but understanding why it was flagged and what might have created it.

This need has become even clearer as evidence continues to show how difficult AI-generated content is to spot. A study published in Scientific Reports in November 2024 (Hartung et al., 2024) showed that even trained experts often fail to distinguish histological images from AI-generated ones. Involving over 800 participants, researchers found that although experts performed slightly better than untrained participants, they still failed to reliably identify fabricated data. This highlights why scalable, automated detection is critical to maintaining trust in scientific publishing.

Your feedback directly shaped this release. Whether you are a researcher checking your own figures, an editor reviewing submissions, or an integrity officer conducting audits, we hope these improvements make it easier to interpret results and take informed next steps.

What's New

We built this update with two goals in mind: improving detection performance and making results more transparent and actionable.

Here are some of the most important changes:

  • Stronger Model Performance
    We switched to a more recent state-of-the-art vision model and trained it on much larger and more diverse datasets. This includes thousands of AI-generated images created through text-to-image and image-to-image workflows.

  • Realistic Training Pipeline
    Our team built a new pipeline that turns real images from published papers into modified AI versions designed to look realistic. This approach has significantly improved detection robustness across many scientific image types.

  • Expanded Model Coverage
    Detection now recognises images generated by a wider range of tools, including:

    • Firefly (Adobe’s AI system used in Photoshop and other Adobe products)

    • DALL·E 3 (previously used in ChatGPT and widely adopted over the past few years)

    • Stable Diffusion (versions 1.6, 3.5, and XL)

    • Internet-generated images (AI art found through search engines or stock libraries)

  • Model Attribution for More Transparency
    Previously, users only received a confidence score indicating the likelihood an image was AI-generated. Now, each flagged image includes an “AI assessment” showing a distribution of which generator the system believes was used. For example, you may see a score indicating higher probability of generation by DALL·E compared to Firefly or Stable Diffusion. This gives reviewers and editors more context to evaluate flagged cases.

  • Improved User Interface
    Clicking on any detected issue in the overview now opens the detail view displaying the confidence score and the likely distribution across AI models.

Now Fully Supported

With these improvements in place, we are confident this feature is ready to move out of beta. AI-generated image detection is now an established part of Imagetwin’s integrity toolkit, alongside Image Duplication, Plagiarism, and Manipulation detection. 

We will continue to refine the system, expand our training datasets, and improve model attribution as new generative tools emerge.

If you have questions, ideas, or suggestions, please get in touch with our team! Your input has been critical to getting us here, and we look forward to supporting you as this technology evolves.

Try It Today

Log in to your Imagetwin account to start scanning papers with AI assessments included, or contact our team.

Thank you for helping us preserve image integrity in research.

Private Repositories: Create your own Image Database

Have you ever wanted to create your own private publication database to check papers against? With our newest feature, Private Repository, you can do precisely that. We have just launched this feature and made it available to all customers.

Upload figures and documents to build your own publication database, and when you run a scan, the system will automatically compare images in the scanned paper to those in your repository. If duplicates are detected, they will appear highlighted on the results page. For more details, check out the instructions in the application.

Whether you are investigating retracted literature, screening incoming manuscripts, or tracking your own group’s submissions, private repositories offer a flexible way to monitor figure reuse over time. You can use it to create an archive of previously reviewed papers, check image reuse across submissions from the same lab, or screen against known publications. The feature supports both figures and documents, and integrates directly into your scanning workflow—no extra steps required.

All content in the private repository is bound to your account and cannot be seen or used by any other user.

We store your data securely using industry standards. Learn more in our data privacy policy.

Access your first Private Repository for free

All users now have access to one free private repository to begin building their own image database. This allows you to upload and scan documents against your own collection of figures, with up to 50 MB of storage included at no additional cost. If your needs grow, such as monitoring a larger set of publications or collaborating across a team, you can easily upgrade to a higher storage plan. 

  • One private repository will be available to all users at no cost
  • Includes 50 MB of secure storage
  • Upgrade options are available to both One-Off Package and Yearly Plan users
  • All data remains private to your account (or organization, if using shared repositories)

Beta Version for Detecting AI-Generated Images

We are excited to introduce a major new capability in Imagetwin: AI-generated image detection is now available. With the rapid advancements in generative models, detecting AI-generated images in research papers has become more challenging than ever. To assist you in differentiating between real and synthetic content, we have just launched a beta version for detecting AI-generated images in scientific articles.

Each paper checked by Imagetwin will now also be scanned using our AI detector for images to help researchers identify manipulated visuals. In this post, we focus on the capabilities, current progress, and roadmap for detecting AI-generated images with Imagetwin.

Detecting AI-Generated Images in Science

Existing AI detection models are not optimized for scientific images. We tested freely and commercially available APIs and observed low detection rates and high false-positive rates for domain-specific images, such as western blots and microscopic images. To address this, we trained our own detection model using images from research articles.

We generated thousands of AI images using image-to-image, text-to-image, and inpainting (a technique to modify specific parts of an image). We applied transformations like cropping, rotation, and scaling to these images during training to ensure robustness. Examples of AI-generated images used for training and testing are shown below.

Examples of AI-generated images detected by Imagetwin

Beta Version of Our AI Image Checker

While detecting AI-generated images in science remains technically challenging, this beta release offers a promising first step. The feature is an early beta version that can be disabled in your account settings. Below, we outline key areas of improvement and our roadmap for future updates.

Coverage of Generative Models

Our detection method is optimized for a widely used model for generating scientific images. However, many AI models are currently available, with new ones emerging monthly. As we expand our dataset, we are prioritizing the most common generative models to improve detection accuracy and adaptability.

Image Types

The detection model is optimized for domain-specific images commonly targeted for data manipulation, including microscopy images, western blots, histology/pathology slides, cell cultures, and spot images. It is not, however, trained to detect non-scientific photos (e.g., cats, dogs, or unicorns). Future versions will expand training to include a broader spectrum of scientific imagery, such as plots, graphs, and light photography.

Explainability

One major challenge with flagged images is distinguishing false positives from actual AI-generated content. We are actively exploring solutions to enhance explainability, making it easier to interpret flagged cases. In the meantime, consider the following checks to evaluate detected cases:

The beta release for detecting AI-generated images marks a pivotal step in strengthening research integrity. We are continuously improving our detection models and welcome your feedback as we work towards full-scale, reliable detection of AI-generated scientific imagery.

Introducing: Confidence Scores

Not every duplicate is problematic: experts differentiate between appropriate and inappropriate cases. Appropriate duplicates are not research integrity issues, while inappropriate ones are integrity issues and can lead to a paper’s rejection, correction, or retraction. With our latest update, we display confidence scores from 0% to 100% for each detected duplicate. A high confidence (e.g., 99% in the example above) means that the finding is likely inappropriate. In contrast, a low score (e.g., 2% in the “merge” duplicate example) means the case is likely appropriate. The confidence score helps you to quickly separate between relevant and irrelevant cases.

New user interface

In the updated user interface, each duplicate is accompanied by a confidence score. A slider can be used to filter duplicates based on a specific confidence threshold. We differentiate between three ranges: low (0%-32%), fair (33%-65%), and high (66%-100%). By default, the slider starts at 33%, displaying fair and high cases. We recommend reviewing all fair and high cases. To deepen your investigation, you can examine duplicates in the low range by adjusting the slider. Based on the current filters, we summarize how many findings are currently shown/hidden. All currently shown findings are included in the PDF report.

Appropriate vs. inappropriate

We compute the confidence/relevancy of a duplicate using several features. Features include image similarity, image class (e.g., radiology, spot image, or western blot), duplicate type (e.g., duplicate inside the same paper or across papers), and many others. Based on these features, we derive a confidence score for each duplicate, representing the appropriateness of a duplicate. We measured the effectiveness of our algorithm on inappropriate duplicates posted on PubPeer mixed with appropriate ones from randomly sampled publications.

Appropriate examples are versatile, such as two images of the same microscopy image with different zoom factors, images with different color channel overlays (merge), or radiology images showing a brain scan with different color injections. We analyzed 5068 duplicates, consisting of 1797 inappropriate and 3271 appropriate duplicates. Of the 1797 inappropriate duplicates, we correctly classified 1733 cases (i.e., we predicted a confidence within the range of 33%-100%). Of the 3271 appropriate duplicates, we correctly classified 2936 cases (i.e., confidence in 0%-33%).

Bulk processing

The differentiation between low, fair, and high-confidence duplicates can help efficiently bulk process large paper volumes. By knowing which papers contain duplicates with a fair or high chance of being inappropriate, users can specifically investigate these duplicates. Our API allows bulk processing of hundreds of papers, and during the next week, we will update our API* to include information on whether low, fair, or high-confidence findings were detected in the scanned documents. Even when scanning hundreds of papers, only a small manual effort is necessary to detect inappropriate duplicates at scale.

Figure Reuse Detection

Imagetwin now detects figure reuse across articles for all figure types, including diagrams, flowcharts, box plots, photographs, and more. While our initial focus was on life science images such as FACS plots, microscopy images, or Western blots, this update expands Imagetwin’s capabilities to cover all scientific disciplines. Whether in the humanities, computer science, mathematics, or physics, Imagetwin can now effectively identify problematic manuscripts containing plagiarized figures or other inappropriate scientific behaviour.

Figure extraction and reuse detection

The system automatically extracts figures from manuscripts and compares them against our image database to detect reuse across articles. If a figure or substantial portions of it have been reused in another publication, Imagetwin flags the duplicate. These flagged duplicates are displayed in our web application, with detailed information about the source paper, such as DOI, authors, and publication year. This enables users to quickly determine whether the duplication represents acceptable reuse or potential scientific misconduct.

For figure reuse detection, we have indexed 25 million figures from previously published articles in our image database. We will continuously add new figures in future updates.

Detection accuracy

We evaluated the figure reuse detection algorithm by measuring its accuracy and false positive rate. We used 216 figures extracted from 50 randomly selected papers. We applied various alterations to the figures, including cropping, image quality/resolution changes, JPEG compression, and adjustments to brightness and contrast. We use the altered images to measure how effective the algorithm is in detecting the figures’ original sources. The algorithm successfully identified 197 of the 216 figures, achieving a 91% accuracy rate.

Cases, where duplication was not detected (false negatives), involved small figures without distinctive features, making unique identification across millions of images difficult. We identified 3 false positives among the 216 evaluated figures, resulting in a false positive rate of about 1%. The false positives involved maps and highly similar (but not identical) bar plots. The evaluation results demonstrate the robustness of our algorithm in accurately detecting figure reuse while maintaining a minimal false positive rate.

Metric Result
Accuracy 197 of 216 figure duplicates detected (91% accuracy)
False positive rate 3 false positives in 216 checked figures (1% false positive rate)
Evaluation results for figure reuse detection applied on 216 figures from 50 papers

Duplicate Detection in FACS Plots

We just released a new AI-based algorithm to effectively detect duplications in fluorescence-activated cell sorting (FACS) plots. FACS plots are a common target for inappropriate image duplication. Hundreds of posts regarding FACS plots have emerged on PubPeer. While some duplications are honest mistakes caused by sloppy researchers, others are more severe and are used to fabricate data deliberately. Imagetwin now detects duplicates in FACS plots accurately, helping to identify problematic manuscripts.

Detection within and across publications

We detect two categories of duplicates: those appearing within the scanned manuscript and those appearing across publications. After uploading a manuscript, panels showing FACS plots are automatically extracted and compared to identify duplications. In addition to finding duplicates within the uploaded manuscript, the panels are compared against our image database, which initially contains 2.6 million FACS plots from previously published articles. We will constantly add new FACS plots to the database in future updates. The check against our database helps identify the reuse of FACS plots across publications.

Detection accuracy

We evaluated the FACS detection algorithm to measure accuracy, false positives, and runtime. We applied our algorithm to 152 FACS duplicates posted on PubPeer and correctly identified the duplication in 121 of the 152 cases (80% accuracy).

Cases where we could not find the duplication (i.e., false negatives) were blurry, low-resolution images, or cases where only a tiny fraction of the entire image was duplicated. To measure the false positive rate, we applied the algorithm to 3,000 FACS plots randomly sampled from papers published from 2010 to 2023. Across these 3,000 images, we found 5 false positive duplicates caused by similar labelling (i.e., lines, arrows, text descriptions). In addition to false positives, we detected more than 50 suspicious, probably inappropriate duplicates.

The new algorithm to detect duplicates in FACS plots is efficient: we measured a runtime of 30 milliseconds per image, so for checking 100 FACS plots, the results will be ready in just 3 seconds.

Metric Result
Accuracy Detected 121 of 152 duplicates sampled from Pubpeer
False positives 5 false positives in 3,000 checked FACS plots
Runtime 30 milliseconds per FACS plot
Evaluation results for FACS duplicate detection

New Algorithm for Splicing Detection

Imagetwin now detects splicing seams in gel band images, such as western blots. A western blot consists of several lanes, whereas the procedure of removing, inserting, or re-assembling individual lanes is known as splicing. Splicing is not necessarily inappropriate behaviour, especially when declared. However, in some cases, splicing is used to manipulate the original image data, and it is critical to detect these cases. The splicing detection is accessible through our web application and the API, helping you to identify splicing seams quickly and effectively.

Forensics toolbar

Whenever a gel band image is input into our software, the image is screened for potential splicing seams. Findings are then presented in the web application with a score indicating how confident our machine learning model is in their prediction. A finding can be further investigated in the detail view using the new forensics toolbar. The forensics toolbar allows for altering the image, such as changing the brightness or the contrast. This helps to improve the visibility of details that are difficult to see by eye. The forensics toolbar might also be useful for other findings, such as improving visualizations of duplicates.

A splicing seam is shown in the detail view with the forensics toolbar on the right.

Detection accuracy

We evaluated the splicing detection algorithm in terms of accuracy and false positives. To measure accuracy, we tested the algorithm on 157 spliced western blots recently posted on PubPeer. The software correctly identified and located the splicing seams in 127 of the 157 images (81% accuracy). To measure the false positive rate, we applied the algorithm to 500 western blots randomly sampled from papers published from 2010 to 2023. In total, 15 of the 500 images were flagged, which we further analyzed. Of the 15 flagged images, 7 were actually spliced, 2 were incorrectly flagged as spliced, and 6 were low-quality images difficult to assess because of pixelated areas and compression artefacts (e.g., JPG).

Data Number of western blots
Spliced western blots sampled from PubPeer 157
Western blots sampled from publications from 2010 to 2023 500
Data used for evaluation