Artificial intelligence is rapidly transforming healthcare operations. From automating administrative workflows to assisting in patient education, tools like ChatGPT and other large language models (LLMs) promise efficiency and innovation. But in a highly regulated industry like healthcare, not all AI is created equal. When healthcare professionals use non-HIPAA compliant AI tools, they expose their organizations to compliance, ethical, and security risks.
As David Holt, Owner, Holt Law LLC, says, “ChatGPT definitely has the potential to make a big difference in healthcare by speeding up administrative work, helping staff, and making patient education more engaging. There are some important limitations to keep in mind. First, it doesn’t actually ‘understand’ medicine—it can sound confident even when it gives incorrect or misleading information, which could be risky in a clinical setting. It’s also not up to date with the latest medical guidelines or treatments if you're using versions trained on older data. Another issue is bias—since ChatGPT was trained on large sets of data from the internet, it can reflect gaps and inequalities that already exist in healthcare, especially for underrepresented communities. Plus, as of today, it can only work with text, so it’s not helpful for anything that involves images, like X-rays or visual diagnoses. Sometimes the answers it gives can be too general or surface-level, missing the detail you’d need in complex medical situations. And maybe most importantly, the public versions aren’t HIPAA-compliant, which means using them with any patient data could lead to privacy risks or security breaches.”
David Holt’s insights reflect the growing realization that data protection and patient privacy cannot be outsourced to general-purpose AI. Public versions of LLMs like ChatGPT, Claude, or Gemini were never designed to handle protected health information (PHI). Using them in clinical, administrative, or even educational workflows that involve PHI may violate HIPAA, risking severe penalties and reputational harm.
Public LLMs are trained on static datasets that stop at a certain date. That means guidance about treatments, drug approvals, or clinical protocols can be out of date. “Large Language Models (LLMs) are often paired with a reported cutoff date, the time at which training data was gathered. Such information is crucial for applications where the LLM must provide up-to-date information,” states the study Dated Data: Tracing Knowledge Cutoffs in Large Language Models. With healthcare guidelines continuously changing and new evidence appearing frequently, relying on a model that doesn’t automatically reference the latest literature risks recommending obsolete or unsafe actions.
Even when models are updated more frequently, they are not substitutes for curated, peer-reviewed clinical guidance or local formularies. For use cases such as patient education or administrative drafting, LLMs can help with tone and structure, but they must be paired with verified, up-to-date clinical checks before the content is used with patients.
LLMs frequently have hallucinations, generating false, falsified, or deceptive information. Even a minor prevalence of hallucinations can be problematic in clinical settings since outputs are given in formal, authoritative language that may mislead administrators or busy clinicians..
The authors of the study, Developing and evaluating large language model–generated emergency medicine handoff notes, compare LLM-generated clinical notes to physician-written notes and found higher rates of incorrectness in model outputs. The study found that LLM notes had a 9.6% error rate compared to 2.0% for physician notes. Though many errors in that study were not catastrophic, the phenomenon is real and measurable. When hallucinations affect patient-facing or decision-influencing content, patient safety can quickly be jeopardized.
Examples from everyday life highlight the stakes: a high-profile error in a Google research write-up, the invented term “basilar ganglia”, showed how model-style mistakes can slip into clinical materials and be missed by reviewers, raising alarms about automation bias.
Read more:
LLMs are a reflection of the vast quantities of online content that they were trained on. Systemic biases, preconceptions, and gaps are frequently present in such data. Algorithmic bias in healthcare can cause models to underidentify symptoms in specific populations, recommend culturally inappropriate solutions, or exacerbate gaps by favoring the language and norms of the majority group.
Research shows how prejudice can enter AI systems at various phases, including data collection, labeling, model construction, and deployment, and how, if not actively avoided, these biases can replicate or exacerbate health disparities. For this reason, any plan for implementing AI in healthcare must include diverse datasets, fairness testing, and stakeholder engagement.
David Holt noted that “as of today, it can only work with text”—an important operational limitation for many consumer LLMs. Clinical work often depends on multimodal data: imaging (X-rays, CTs), waveforms (ECGs), scans, and photos. While specialized multimodal models are emerging, generic public LLMs are not designed to parse or interpret clinical images, nor to integrate them meaningfully into diagnostic reasoning.
Even in pure-text tasks, LLMs tend to produce generalist answers. They may miss the nuance required for complex cases: differential diagnosis subtleties, drug interactions in polypharmacy, dose adjustments for renal impairment, or contraindications tied to comorbidities. Those gaps make them unsuitable to replace clinical judgment.
Perhaps the most immediate operational risk for providers is privacy and regulatory compliance. Public versions of consumer LLM platforms do not enter BAAs with covered entities, and data sent to those services can be retained and used to improve models. That means putting protected health information (PHI) into a public LLM may create a HIPAA violation or a data breach.
Clinicians and colleagues should not paste PHI into non-medical LLMs without a HIPAA-compliant contractual and technological arrangement, BAA, zero-data-retention endpoints, and corporate solutions with appropriate controls. This is according to unambiguous guidance and analysis from privacy experts.
The use of poorly regulated AI in healthcare has been identified by ECRI as a major health-technology risk. Misleading AI outputs can cause disproportionate harm in emergency and acute settings, where time is of the essence and decisions have urgent implications. If left unchecked, the mix of clinician automation bias, time constraints, and trusted language creates a hazardous vector.
Read more: Dangers of AI tops health tech hazards list for 2025
Adversarial inputs and transcription errors also pose practical dangers beyond accidental hallucinations. In the study Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support, the researchers show that LLMs can be manipulated or can mistakenly transcribe content, at times inserting fabricated sentences or misattributing statements in medical conversations. Even a small percentage of errors in clinical transcripts can have outsized consequences in legal or clinical documentation.
Read also: Hospitals use a transcription tool powered by an error-prone OpenAI model
The systematic review Implementing large language models in healthcare while balancing control, collaboration, costs and security on AI adoption in healthcare notes stakeholder engagement, continuous monitoring, workflow alignment, and ethical governance. We can derive a robust set of safe-use practices for LLMs in clinical settings:
See also: HIPAA Compliant Email: The Definitive Guide (2025 Update)
Non-medical LLMs are large language models like ChatGPT or Gemini that were trained on general internet data rather than healthcare-specific, peer-reviewed medical datasets. They can write or summarize text effectively, but were not designed for clinical accuracy, safety, or compliance with healthcare regulations such as HIPAA.
Yes, some enterprise-grade platforms, such as Microsoft Azure OpenAI Service, can offer HIPAA compliance if a business associate agreement (BAA) is in place and appropriate data-handling safeguards are configured. Always confirm this directly with the vendor before using PHI.
Treat it as a potential HIPAA breach. Notify your compliance officer immediately, document the exposure, and follow your organization’s breach response plan. Evaluate whether the data can be contained and whether patient notification or HHS reporting is required.