What does it actually take to catch research fraud, and what happens after you do? In our recent webinar, Sholto David, one of the most active research integrity sleuths in the field, joined Imagetwin Co-Founder and CEO Patrick Starke for an honest conversation about how image manipulation gets found, what the Dana-Farber investigation looked like from the inside, and what the scientific community can learn from it.
Sholto’s Journey to Research Integrity Sleuthing
As a biologist working in biotech, Sholto noticed that studies on alternative medicine treatments, such as acupuncture, herbal remedies, consistently produced positive results despite seeming scientifically implausible. That skepticism led him to look more closely at the data behind those papers.
His early work focused on statistical and numerical errors, which he reported through letters to the editor, a frustrating process. One letter critiquing a paper was sent for peer review by the very authors he was criticizing, then rejected. That experience pushed him toward PubPeer, a public platform for commenting on academic research, where he discovered a community already identifying image problems in papers.
Image manipulation, he realized, had a key advantage over statistical errors: it’s immediately communicable. You can show someone two identical images and the problem is self-evident. You can’t do that with a p-value.
How He Actually Does It
Sholto described two main modes of investigation. The first is broad: searching Google Scholar using terms likely to surface image-heavy papers in fields with known integrity problems, toxicology for instance. The second is narrow: focusing on a specific researcher after receiving a tip or spotting something suspicious.
His toolkit combines manual reading with automated tools. He’s emphatic that reading and understanding papers is foundational, every comment he posts, across nearly 8,000 PubPeer entries, has been written and verified by hand.
For automated screening, he uses Imagetwin, which he described as particularly valuable for one thing he simply cannot do manually: checking whether an image has been published before in another paper. “If someone’s taking images from other papers around, that can only be done with technology,” he said.
The Dana-Farber Case
Sholto began examining Dana-Farber papers at the end of 2023, following co-authorship connections from researchers at Memorial Sloan Kettering and the NCI. His early 2024 blog post documented image problems across roughly 60 papers, Western blots that had been cut, rotated, or stolen outright from unrelated publications. Dana-Farber responded quickly, committing to correct around 30 papers and retract five or six, unusual transparency, as most institutions would simply stay quiet.
The case then took a legal turn. Attorney Eugenie Reich approached Sholto about filing under the False Claims Act: if the NIH had known about the manipulated data, it wouldn’t have funded those grants in the first place, meaning Dana-Farber had effectively received money under false pretenses. The DOJ reached out independently too, leaving Sholto a straightforward choice, be a witness in their case, or a relator on his own and receive a share of any settlement. He filed with Reich.
After 18 months building the case, Dana-Farber agreed to pay back $15 million. Sholto and Reich received 17.5%. The research had focused on targeted blood cancer treatments, some of which proceeded to clinical trials that failed, exposing real patients to side effects from treatments built on manipulated data.
Advice for Editors and Integrity Officers
The most useful shift, Sholto said, is attitudinal: approach every paper assuming there might be a mistake. Once you look for problems, you start finding them. For images specifically, he offered a few practical signals to watch for:
- Gut instinct on similarity: Biological and material science images should vary because conditions vary. Two images that look suspiciously similar in texture, density, or lighting often are duplicates.
- Obscured corners: Labels or letters hidden in image corners can indicate the image was taken from another paper and relabeled.
- Low image quality: If researchers took images in a lab, they should have high-resolution originals. Heavily compressed JPEGs are a reason to request the original file.
For systematic screening, he recommended tools like Imagetwin, particularly its cross-database matching feature, alongside plagiarism detection and, increasingly, tools that flag AI-generated citations.
The Value of Catching Problems Early
A theme running through the conversation was the cost of finding problems late. Clinical trials that don’t work, grants spent on science that can’t be reproduced, reputational damage that could have been avoided. Every stage of the publication process, the lab, the institution, peer review, the publisher, had an opportunity to catch what happened at Dana-Farber earlier.
Tools like Imagetwin exist precisely to move that detection early in the process. The goal is to make the conditions for it harder to sustain in the first place.