Here is a comprehensive, clinically-oriented AI glossary every medical professional should be fluent in:
AI Terminology Every Medical Professional Should Know
Organized from foundational concepts → clinical applications → risks & ethics.
🔷 FOUNDATIONAL CONCEPTS
Artificial Intelligence (AI)
A broad field of computer science where machines perform tasks that normally require human intelligence — reasoning, recognizing patterns, making decisions, understanding language.
Machine Learning (ML)
A subset of AI where algorithms learn patterns from data without being explicitly programmed. The more data they see, the better they perform. Most clinical AI tools are ML-based.
Deep Learning (DL)
A subset of ML using neural networks with many layers (hence "deep"). Excels at image recognition (radiology, pathology, dermatology), speech, and complex pattern detection. Powers most modern diagnostic AI.
Neural Network
A computational model loosely inspired by the brain — layers of interconnected "neurons" that transform input data into outputs. The backbone of deep learning.
Algorithm
A set of rules or instructions a computer follows to solve a problem or make a prediction. In medicine, an AI algorithm might predict sepsis risk from vital signs or classify a chest X-ray.
Training Data
The dataset used to teach an AI model. Garbage in, garbage out — biased or unrepresentative training data produces biased models.
Model
The output of the training process — the AI system that takes new inputs and makes predictions. "The model" is what clinicians interact with.
Parameters / Weights
Internal numbers in a model adjusted during training to improve accuracy. A model with billions of parameters (like GPT-4) can handle complex language tasks.
🔷 AI MODEL TYPES
Supervised Learning
The model learns from labeled examples (e.g., chest X-rays labeled "pneumonia" or "normal"). Most diagnostic AI uses this approach.
Unsupervised Learning
The model finds patterns in unlabeled data on its own (e.g., clustering patients by similar symptom profiles without predefined categories). Used in phenotyping and discovery research.
Reinforcement Learning
The model learns by trial and error, receiving rewards for correct actions. Used in robotic surgery training and treatment optimization.
Large Language Model (LLM)
An AI model trained on vast amounts of text to understand and generate human language. ChatGPT, GPT-4, Claude, Gemini are LLMs. In medicine, used for documentation, summarization, clinical reasoning support, and patient communication.
Generative AI
AI that creates new content — text, images, audio, or video. LLMs are generative AI for text. Radiology AI that synthesizes missing imaging modalities is another example.
Foundation Model
A very large model trained on broad data that can be fine-tuned for specific tasks (e.g., a general LLM fine-tuned for psychiatry notes or radiology reports).
Fine-Tuning
Taking a pre-trained foundation model and further training it on a specific, smaller dataset to specialize it (e.g., fine-tuning GPT on psychiatric interview transcripts).
🔷 CLINICAL AI APPLICATIONS
Natural Language Processing (NLP)
AI that understands, interprets, and generates human language. In medicine: extracting diagnoses from unstructured notes, analyzing discharge summaries, powering AI scribes and chatbots.
Computer Vision
AI that interprets visual data. In medicine: reading X-rays, CT scans, MRIs, pathology slides, retinal images, skin lesions. The most mature area of clinical AI.
Clinical Decision Support System (CDSS)
Software that assists clinicians in making decisions — drug interaction alerts, sepsis alerts, diagnostic suggestions. AI-powered CDSS goes beyond rule-based systems to learn from data.
Ambient AI / AI Scribe
AI that listens to a clinical encounter and automatically generates documentation (notes, summaries). Currently the fastest-growing clinical AI category. Examples: Nuance DAX Copilot, Suki.
Digital Phenotyping
Using passively collected smartphone data (GPS, accelerometer, call logs, screen time) to infer mental health status. Relevant to psychiatry for mood disorder monitoring and relapse prediction.
Predictive Analytics
Using historical data to forecast future events — patient deterioration, readmission risk, suicide risk, sepsis onset. Distinct from diagnosis; it's about what will happen next.
Precision Medicine / Precision Psychiatry
Using individual patient data (genomics, biomarkers, imaging, behavior) with AI to tailor treatment to the specific person rather than applying population-level guidelines.
Digital Therapeutics (DTx)
Software-based treatments — including AI-driven apps — that deliver evidence-based therapeutic interventions. Some are FDA-authorized (e.g., Rejoyn for MDD, Freespira for PTSD). Different from wellness apps.
🔷 PERFORMANCE METRICS (Critical for Evaluating AI)
Sensitivity (Recall)
Proportion of true positives correctly identified. A high-sensitivity AI misses few real cases. Critical for screening tools (e.g., sepsis alert, cancer detection).
Specificity
Proportion of true negatives correctly identified. High specificity = few false alarms. Critical to avoid alert fatigue.
PPV (Positive Predictive Value)
The probability that a positive AI result is truly positive. Depends heavily on disease prevalence — even a good AI has low PPV in rare conditions.
NPV (Negative Predictive Value)
The probability that a negative AI result is truly negative. High NPV is essential for "rule out" tools.
AUC-ROC (Area Under the Curve)
A measure of overall diagnostic discrimination. AUC of 1.0 = perfect; 0.5 = no better than chance. Commonly reported in clinical AI studies. A useful single-number summary.
F1 Score
Harmonic mean of sensitivity and PPV. Useful when both false positives and false negatives matter equally.
Calibration
Whether an AI's predicted probabilities match real-world frequencies — e.g., does a "70% risk" prediction actually come true ~70% of the time? Often neglected in medical AI papers.
Generalizability / External Validation
Whether a model performs well on data from a different hospital or population than it was trained on. Many clinical AI models fail at external validation — a major limitation.
🔷 BIAS, SAFETY & ETHICS
Algorithmic Bias
Systematic errors arising from biased training data or model design, causing AI to perform worse for certain groups (racial minorities, women, elderly, low-income populations). A major concern in clinical AI.
Hallucination
When an LLM generates plausible-sounding but factually incorrect information. A doctor asking ChatGPT about a medication dose could receive a confident but wrong answer. Critical safety risk.
Black Box
An AI model whose internal reasoning is opaque — it gives a result but cannot explain why. Deep learning models are often black boxes. Problematic for clinical accountability.
Explainability / Interpretability (XAI)
The degree to which an AI's reasoning can be understood by humans. Regulators and clinicians increasingly require explainable AI — "why did it flag this patient?"
SHAP Values (SHapley Additive exPlanations)
A common method for explaining which features drove an AI's prediction for a specific case. Helps turn black-box outputs into interpretable clinical insights.
Overfitting
When a model learns the training data too well, including its noise, and performs poorly on new data. A model with 99% accuracy in training but 60% in real use is overfit.
Data Leakage
When information from outside the training timeframe leaks into the model, artificially inflating performance metrics. Common methodological error in medical AI studies.
Federated Learning
Training AI models across multiple institutions without sharing raw patient data — the model learns locally, only model updates (not patient records) are shared. Key for privacy-preserving healthcare AI.
Differential Privacy
A mathematical framework for adding controlled noise to datasets so AI can learn patterns without being able to identify individual patients.
🔷 REGULATORY & DEPLOYMENT TERMS
FDA-Cleared vs. FDA-Authorized
- 510(k) cleared: AI device is substantially equivalent to an existing device
- De Novo authorized: Novel device with no predicate, low-to-moderate risk
- PMA (Premarket Approval): Highest-risk AI devices requiring clinical trial evidence
SaMD (Software as a Medical Device)
Regulatory term for software intended to perform medical functions — diagnosis, treatment planning, monitoring. AI clinical tools are typically classified as SaMD and require regulatory oversight.
Clinical Validation
Evidence that an AI tool performs safely and effectively in a real clinical population, not just on benchmark datasets.
Ambient Intelligence
The use of AI that operates seamlessly in the background of a clinical environment — sensors, cameras, microphones — to monitor patients, detect falls, or capture interactions without active input.
Interoperability / FHIR
The ability of AI tools to exchange data across EHR systems. FHIR (Fast Healthcare Interoperability Resources) is the standard enabling AI apps to plug into hospital systems.
Human-in-the-Loop (HITL)
An AI design principle where a human must review and approve AI decisions before they take effect. Most safe clinical AI operates this way — AI suggests, clinician decides.
🔷 EMERGING TERMS (2025–2026)
| Term | Meaning |
|---|
| Agentic AI | AI that can take multi-step actions autonomously (e.g., order a lab, look up results, draft a referral) — increasingly relevant in clinical workflows |
| RAG (Retrieval-Augmented Generation) | LLM that retrieves real-time information from a database before generating a response — reduces hallucination; used in clinical decision support |
| Multimodal AI | AI that processes multiple data types simultaneously — text, images, audio, genomics — for richer clinical insights |
| Digital Twin | A virtual model of a patient updated in real time with their physiological data, allowing simulation of disease progression or treatment response |
| AI Psychosis | Emerging clinical phenomenon — delusional thinking or pathological over-identification arising from chatbot interactions (2025) |
| Prompt Engineering | The practice of crafting inputs to LLMs to get accurate, relevant, safe outputs — increasingly a clinical skill |
| GPT / Gemini / Claude | Specific LLM families from OpenAI, Google, and Anthropic respectively — the "brand names" of generative AI |
Quick Reference Card
AI → ML → Deep Learning → Neural Networks (hierarchy)
Training data → Model → Prediction → Validation
Sensitivity = catches true cases
Specificity = avoids false alarms
AUC = overall discrimination
Calibration = are probabilities realistic?
Hallucination = AI confidently lies
Bias = unfair performance across groups
Black box = can't explain reasoning
Overfitting = works in training, fails in real world
SaMD = software regulated as a medical device
HITL = human must approve AI decisions
FHIR = data interoperability standard
Fluency in these terms allows medical professionals to critically evaluate AI tools, participate in procurement decisions, understand published AI research, and explain AI to patients and colleagues — rather than simply being passive users of systems they don't understand.