AI-Generated Deepfake X-Rays Fool Both Radiologists and AI

The boundary between authentic medical diagnostics and artificial intelligence is blurring, creating a new frontier of risk for the global healthcare system. A recent study has revealed that even seasoned radiologists struggle to distinguish between genuine X-ray images and “deepfake” medical imagery generated by AI, raising urgent questions about the future of diagnostic integrity.

This vulnerability is not limited to human practitioners. The research indicates that the very large language models (LLMs) used to create these deceptive images are often unable to identify their own synthetic outputs. As these generative tools become more sophisticated, the potential for deepfake medical imaging to be misused for insurance fraud, medical malpractice lawsuits, or the delivery of incorrect diagnoses becomes a pressing concern for public health officials.

The study, conducted by a research team at the Icahn School of Medicine at Mount Sinai, involved 17 radiologists from 12 different institutions across six countries, including the United States, France, Germany, Turkey, the United Kingdom, and the United Arab Emirates. By testing a diverse group of experts—ranging from new graduates to veterans with 40 years of experience—the team sought to determine if professional expertise could serve as a bulwark against AI-generated deception.

The results were sobering. When radiologists were unaware that synthetic images were present in the dataset, their ability to correctly identify AI-generated X-rays was only 41% according to reports on the study. This suggests that deepfake medical imaging is now realistic enough to deceive the human eye, even when that eye is trained by decades of clinical practice.

The Limits of Human and Artificial Detection

The research team utilized a total of 264 images, comprising both real X-rays and AI-generated counterparts, to test the discernment capabilities of the participants. Interestingly, the study found that prior knowledge of the “trick” significantly altered the results. When the radiologists were informed beforehand that synthetic images were included in the set, the accuracy of their identification rose to an average of 75%.

But, individual performance varied wildly. For images generated by ChatGPT, the accuracy rate among specialists ranged from as low as 58% to as high as 92%. Crucially, the researchers found no clear correlation between a radiologist’s years of experience and their ability to spot a fake. A doctor with 40 years of experience was not necessarily better at detecting a deepfake than a recent graduate.

There was one notable exception: musculoskeletal specialists demonstrated a relatively higher accuracy in identifying synthetic images compared to other sub-specialties. This suggests that the specific anatomical nuances of the musculoskeletal system may currently be harder for AI to replicate perfectly than other bodily structures.

AI’s Struggle to Recognize Its Own Creations

One of the most startling findings of the research is that AI is not a foolproof solution for detecting AI-generated fraud. The study tested several multimodal LLMs, including GPT-4o, GPT-5, Gemini 2.5 Pro, and Llama 4 Maverick. The accuracy of these models in identifying deepfakes hovered between 57% and 85% as detailed in the research findings.

Even GPT-4o, the model used to generate some of the images, failed to identify all of its own creations. This “blind spot” in generative AI underscores a critical systemic risk: if the creators of the technology cannot reliably detect the fakes, the medical community is left without a definitive digital “fingerprint” to verify the authenticity of a patient’s imaging.

Potential Risks and the “Next Step” in Deception

The implications of this technology extend far beyond a controlled research environment. Dr. Michael Tordzman, the lead author of the study from the Icahn School of Medicine at Mount Sinai, warned that these images are realistic enough to deceive experts even when they are on high alert. The danger lies in the potential for “deepfake medical imaging” to be used maliciously.

Medical journalists and ethics experts warn that such images could be used to fabricate medical histories for fraudulent insurance claims or to create false evidence in medical malpractice litigation. More dangerously, if a synthetic image is mistakenly introduced into a patient’s real medical record, it could lead to unnecessary surgeries or the failure to treat a real condition, directly impacting patient safety.

the evolution of this technology is moving rapidly. Dr. Tordzman noted that the next stage of development could see these deepfake capabilities expand from simple X-rays to more complex imaging modalities, such as CT scans and MRIs according to his statements on the study. Because CT and MRI scans provide three-dimensional data and higher resolution, the potential for highly convincing—yet entirely fabricated—pathologies increases.

Identifying the “Tells” of AI Imaging

Despite the high success rate of the fakes, the research team did identify certain characteristics that can help distinguish AI images from real ones. These “artifacts” often include:

Unnatural textures: Subtle blurring or “smudging” in areas that should have sharp anatomical edges.
Anatomical inconsistencies: Structural anomalies that a human doctor might recognize as biologically impossible, even if the overall image looks “correct.”
Digital “smudges” or blotches: Specific patterns of noise or artifacts that are characteristic of current generative AI models.

The study, which was published in the journal Radiology (방사선의학), emphasizes that as AI continues to learn from these failures, these tells will likely disappear, making the need for specialized detection training even more urgent as reported by Kormedi.

The Path Forward: Education and Safeguards

The medical community is now facing a race between the generation of deepfakes and the development of detection tools. Experts argue that relying solely on human intuition is no longer viable. Instead, a dual approach involving both human education and AI-driven verification is required.

Medical professionals must be trained to recognize the specific artifacts of generative AI, while developers must create “watermarking” or cryptographic verification systems for medical imaging. If every legitimate X-ray were digitally signed at the point of capture, the introduction of a synthetic image would be immediately apparent as it would lack the necessary security credentials.

The rise of deepfakes in healthcare serves as a cautionary tale about the rapid adoption of generative AI. While AI offers immense potential for improving diagnostic speed and accuracy, its ability to create “perfect” lies introduces a level of systemic risk that the healthcare industry is not yet fully equipped to handle.

Key Takeaways for Healthcare Providers

Expertise is not a shield: Years of experience do not necessarily correlate with a higher ability to detect AI-generated X-rays.
AI cannot self-police: Even the most advanced LLMs cannot perfectly identify their own synthetic images.
High-risk modalities: While X-rays are the current focus, CT and MRI scans are likely the next targets for deepfake generation.
Skepticism is key: Knowing that a fake might be present significantly increases a clinician’s detection accuracy.

As the medical community continues to integrate AI into clinical workflows, the focus must shift toward establishing rigorous verification protocols to ensure that the images guiding surgical and treatment decisions are grounded in biological reality, not algorithmic imagination.

The next critical step for the industry will be the development and implementation of standardized AI-detection software integrated directly into Picture Archiving and Communication Systems (PACS) to flag potential synthetic images before they reach the radiologist’s desk.

Do you believe AI-generated imagery poses a significant threat to medical diagnostic integrity? Share your thoughts in the comments below or share this article with your colleagues to start the conversation.

AI-Generated Deepfake X-Rays Fool Both Radiologists and AI

The Limits of Human and Artificial Detection

AI’s Struggle to Recognize Its Own Creations

Potential Risks and the “Next Step” in Deception

Identifying the “Tells” of AI Imaging

The Path Forward: Education and Safeguards

Key Takeaways for Healthcare Providers

Related

Leave a Comment Cancel reply

The Limits of Human and Artificial Detection

AI’s Struggle to Recognize Its Own Creations

Potential Risks and the “Next Step” in Deception

Identifying the “Tells” of AI Imaging

The Path Forward: Education and Safeguards

Key Takeaways for Healthcare Providers

Share this:

Related

Leave a Comment Cancel reply