The Complete AI Terminology Guide

175 Essential Terms Across 12 KeyCategories

A comprehensive reference for understanding artificial intelligence in 2026

1. Foundational AI & Machine LearningConcepts

Algorithm — A set of mathematical rules or instructions amodel follows to learn from data and make predictions.

Artificial Intelligence (AI) — The broad field of computerscience focused on building systems that can perform tasksrequiring human-like intelligence, such as reasoning, perception,and language.

Cognitive Architecture — A framework that models theunderlying structure of an intelligent agent's mind, includingmemory, perception, and decision-making.

Deep Learning — A subset of ML using multi-layered neuralnetworks to learn hierarchical representations from largedatasets.

Hyperparameter — Configuration settings (e.g., learning rate,batch size) set before training that control the learning processitself.

Inference — The process of using a trained model to generateoutputs or predictions on new, unseen inputs.

Machine Learning (ML) — A subfield of AI where algorithmslearn patterns from data without being explicitly programmedfor each task.

Model — The trained artifact produced by running an algorithmon data; it encapsulates learned patterns for making predictions.

Neural Network — A computational model inspired by thehuman brain, composed of interconnected nodes (neurons) thatprocess and transform data.

Parameters — The internal numerical weights of a model thatare adjusted during training to encode learned knowledge.

Supervised Learning — Training where the model learns fromlabeled input-output pairs.

Training — The process of feeding data to a model and adjustingits parameters so it learns to perform a task correctly.

Transfer Learning — Reusing a model trained on one task asthe starting point for a different but related task.

Unsupervised Learning — Training where the model findspatterns in unlabeled data without explicit guidance.

World Model — An internal representation an AI builds of howthe environment works, used for planning and prediction.

2. Large Language Models & Architecture

Activation Function — A mathematical function (e.g., ReLU,GELU, SiLU) that introduces non-linearity into a neural network,enabling it to learn complex patterns.

Attention Mechanism — A technique allowing a model to focuson the most relevant parts of an input when producing eachoutput token.

Context Window — The maximum number of tokens a modelcan "see" and process at once during a single forward pass.

Decoder — The Transformer component that generates outputtokens one at a time using encoder representations and prioroutputs.

Embedding — A dense numerical vector representation of atoken or concept in a continuous high-dimensional space.

Encoder — The Transformer component that converts inputtext into rich contextual representations.

Feed-Forward Network (FFN) — The component within eachTransformer block that applies non-linear transformations toeach token's representation independently.

Flash Attention — A memory-efficient attention algorithm thatreorders computations to reduce GPU memory overhead andincrease speed.

Large Language Model (LLM) — A Transformer-based neuralnetwork trained on massive text datasets to understand andgenerate human language.

Layer Normalization — A technique that normalizesactivations within a layer to stabilize training of deepTransformer networks.

Logits — The raw, unnormalized scores a model outputs beforebeing converted to probabilities via softmax.

Multi-Head Attention — Running multiple self-attentionoperations in parallel, each learning different aspects of tokenrelationships.

Positional Encoding — A technique that adds information abouttoken order to embeddings, since Transformers have noinherent sense of sequence.

Residual Connection (Skip Connection) — A technique where alayer's input is added directly to its output, helping gradientsflow during deep network training.

Self-Attention — A form of attention where each token in asequence attends to all other tokens to build contextualrepresentations.

Softmax — A mathematical function that converts logits into aprobability distribution over possible next tokens.

Sparse Attention — A variant of attention that only computesrelationships between a subset of token pairs, enabling longercontext windows at lower cost.

Token — The basic unit of text an LLM processes — typically aword, sub-word, or character.

Tokenization — The process of splitting raw text into tokens(using algorithms like BPE or WordPiece) before feeding it to amodel.

Transformer — The dominant neural network architecture,introduced in 2017, that uses attention mechanisms instead ofrecurrence to process sequences in parallel.

Vocabulary — The complete set of tokens a model knows; wordsor sub-words outside it are unknown.

3. Generative AI

Code Generation — AI's ability to write, complete, or debugsource code from natural language instructions.

Deepfake — Synthetic media (video, audio, image) generated byAI that realistically depicts real people saying or doing thingsthey did not.

Diffusion Model — A generative model that learns to createdata (e.g., images) by reversing a process of gradually addingnoise.

Foundation Model — A large model trained on broad data thatcan be adapted to many downstream tasks (e.g., GPT-4, Claude,Gemini).

GAN (Generative Adversarial Network) — A frameworkwhere a generator and discriminator network compete, causingthe generator to produce increasingly realistic outputs.

Generative AI (GenAI) — AI systems capable of producing newcontent — text, images, audio, code, or video — rather than justclassifying or predicting.

GPT (Generative Pre-trained Transformer) — OpenAI's familyof decoder-only Transformer models trained via next-tokenprediction on large text corpora.

Multimodal Model — An AI model that processes and generatesmultiple types of data (e.g., text and images together).

Text-to-Image — A generative AI capability where a modelcreates images from natural language text descriptions.

Text-to-Speech (TTS) — Technology that converts written textinto synthesized spoken audio.

VAE (Variational Autoencoder) — A generative model thatencodes inputs into a compressed latent space and decodes themback to reconstruct or generate data.

Vision-Language Model (VLM) — A multimodal AI modeltrained to understand and reason over both images and textsimultaneously.

4. Training Methods & Optimization

Backpropagation — The algorithm that calculates gradients ofthe loss with respect to each parameter by propagating errorbackward through the network.

Batch Size — The number of training examples processedtogether in one forward/backward pass during training.

Constitutional AI — Anthropic's method of training AI to behelpful and harmless by having the model self-critique against aset of principles.

Contrastive Learning — A training technique that teaches amodel to pull similar examples together and push dissimilarones apart in embedding space.

Cross-Entropy Loss — The most common loss function forclassification and next-token prediction tasks in LLMs.

Curriculum Learning — A training strategy that presentsexamples in order from easy to hard, mimicking how humanslearn.

Data Augmentation — Techniques for expanding a trainingdataset by creating modified versions of existing data (e.g.,paraphrasing text).

Data Flywheel — A self-reinforcing loop where a productgenerates user interaction data that improves the model, whichattracts more users, generating more data.

Direct Preference Optimization (DPO) — A fine-tuningalternative to RLHF that directly trains a model on humanpreference data without a separate reward model.

Dropout — A regularization technique that randomly disables afraction of neurons during training to prevent overfitting.

Epoch — One complete pass through the entire training datasetduring model training.

Fine-tuning — Adapting a pre-trained model on a smaller, task-specific dataset to improve performance on that task.

Gradient Descent — The core optimization algorithm thatiteratively adjusts model weights in the direction that reducesthe loss function.

Instruction Tuning — Fine-tuning a model on datasets ofinstructions paired with ideal responses to make it better atfollowing user directions.

Learning Rate — A hyperparameter controlling how muchmodel weights are adjusted with each training update.

Loss Function — A mathematical measure of how wrong amodel's predictions are; training aims to minimize it.

Overfitting — When a model learns training data too preciselyand performs poorly on new, unseen data.

Pre-training — The initial large-scale training phase where amodel learns general language patterns from vast datasets.

Regularization — Techniques (e.g., dropout, weight decay) usedto prevent overfitting by constraining model complexity.

Reinforcement Learning (RL) — A training paradigm where anagent learns by receiving rewards or penalties based on itsactions in an environment.

RLAIF (Reinforcement Learning from AI Feedback) — Avariant of RLHF where an AI model (rather than humans)provides the feedback signal.

RLHF (Reinforcement Learning from Human Feedback) — Atechnique where human raters score model outputs and thosescores guide further training to align behavior with humanpreferences.

Self-Supervised Learning — A form of unsupervised learningwhere the model creates its own labels from raw data (e.g.,predicting the next token).

Synthetic Data — Artificially generated training data used whenreal data is scarce, sensitive, or expensive to label.

Underfitting — When a model is too simple to capture theunderlying patterns in training data.

5. Model Architecture Variants

Model Architecture VariantsDense Model — A model where all parameters are used forevery input, in contrast to sparse/MoE models.

Mixture of Agents (MoA) — An architecture where multipleLLM agents collaborate in layers, each refining the outputs ofthe previous layer.

Mixture of Experts (MoE) — A model architecture wheredifferent sub-networks ("experts") specialize in different inputs,and a router activates only the relevant ones per token,improving efficiency.

Multimodal Embedding — A shared vector space where text,images, audio, and other modalities are embedded together,enabling cross-modal retrieval and reasoning.

Reasoning Model — An LLM specifically designed to decomposecomplex problems into multiple logical steps before producingan answer (e.g., OpenAI's o1/o3, DeepSeek R1).

Sparse Model — A model (often MoE-based) where only afraction of parameters are active for any given input, reducingcompute costs.

Speculative Decoding — An inference acceleration techniquewhere a small draft model generates candidate tokens that alarger model then verifies in parallel.

6. Prompting & Interaction Design

Chain-of-Thought (CoT) — A prompting technique thatencourages a model to reason step by step before producing afinal answer.

Few-Shot Prompting — Providing a small number of examples within the prompt to help the model understand the desired task format.

Hallucination — When an AI model generates plausible-sounding but factually incorrect or fabricated information.

Prompt — The input text or instruction given to an AI model toelicit a desired response.

Prompt Engineering — The practice of carefully designingprompts to guide an LLM toward accurate, useful, and well-formatted outputs.

Structured Output — Configuring an LLM to produce outputs ina defined format (e.g., JSON schema) rather than free-form text,critical for production pipelines.

Sycophancy — A model failure mode where it tells users whatthey want to hear rather than what is accurate, prioritizingapproval over truth.

System Prompt — Instructions given to a model at the start of aconversation to establish its persona, constraints, or behavior.

Temperature — A sampling parameter controlling outputrandomness; higher values produce more creative/variedresponses, lower values more deterministic ones.

Top-k Sampling — A decoding strategy where the modelsamples only from the top k most probable next tokens.

Top-p (Nucleus) Sampling — A strategy where the modelsamples from the smallest set of tokens whose cumulativeprobability exceeds a threshold p.

Zero-Shot Prompting — Asking a model to perform a task without providing any examples in the prompt.

7. Retrieval, Memory & Grounding

Chunking — The process of splitting documents into smallersegments before embedding them for retrieval in a RAG pipeline.

Entity Resolution — The process of identifying and linkingreferences to the same real-world entity across different datasources.

GraphRAG — An extension of RAG that uses knowledge graphsto provide structured relational context alongside retrieveddocuments.

Grounding — Connecting an AI model's outputs to verifiablereal-world facts or data, reducing hallucination.

Hybrid Search — Combining keyword-based (BM25) andsemantic (vector) search to improve retrieval quality.

Knowledge Graph — A structured representation of real-worldentities and the relationships between them, often used toaugment AI reasoning.

KV Cache (Key-Value Cache) — A performance optimizationthat stores intermediate attention computations to speed uptoken generation.

Long-Term Memory — Mechanisms that allow an AI agent topersist and retrieve information across multiple sessions.

RAG (Retrieval-Augmented Generation) — A technique thatretrieves relevant external documents and injects them into aprompt so the model can ground answers in real evidence.

Reranking — A step in RAG pipelines where retrieveddocuments are scored and reordered by relevance before beingpassed to the model.

Semantic Search — Search that finds results based on meaningand context rather than exact keyword matches, typically usingembeddings.

Vector Database — A database optimized to store and queryhigh-dimensional embeddings for fast semantic similaritysearch, commonly used in RAG systems.

8. AI Agents & Agentic Systems

Agentic Loop — The repeating cycle of perceive → reason → act→ observe that autonomous AI agents execute to accomplishgoals.

AI Agent — An AI system that can autonomously take actions,use tools, and pursue goals over multiple steps.

Function Calling — A feature in LLM APIs allowing the model tooutput structured calls to developer-defined functions or tools.

Memory-Augmented Agent — An agent equipped with explicitexternal memory stores (short-term and long-term) to maintaincontext across tasks.

Model Context Protocol (MCP) — An emerging open standard(popularized by Anthropic) that defines how AI models connectto external tools and data sources in a standardized way.

Multi-Agent System — A framework where multiple AI agentscollaborate or compete to solve complex tasks.

Orchestration — The coordination layer that manages the flowof tasks between multiple AI agents or components.

Planning — An agent's ability to decompose a high-level goalinto a sequence of concrete sub-tasks.

ReAct (Reason + Act) — A prompting paradigm where an agentalternates between reasoning about a situation and taking anaction.

Sandbox — An isolated execution environment for an AI agentto safely run code or test actions without affecting productionsystems.

Scratchpad / Chain-of-Thought Buffer — A temporary workingmemory space where a model writes intermediate reasoningsteps before producing a final output.

Tool Calling — The broader ability of an AI agent to invokeexternal tools (APIs, databases, code) during a generation step.

Tool Use — An agent's ability to invoke external tools (e.g.,search engines, code interpreters, APIs) to complete tasks.

9. Model Efficiency & Deployment

API (Application Programming Interface) — An interface allowing developers to interact with an AI model or service programmatically.

Distillation (Knowledge Distillation) — Training a smaller"student" model to mimic the behavior of a larger "teacher"model.

Edge AI — Deploying AI models directly on local devices(phones, sensors, cars) rather than in cloud data centers, reducing latency and improving privacy.

GPU (Graphics Processing Unit) — The hardware accelerator most commonly used to train and run deep learning models dueto its massive parallel processing capability.

Latency — The time it takes for a model to produce a responseafter receiving an input.

LoRA (Low-Rank Adaptation) — A parameter-efficient fine-tuning technique that adds small trainable matrices to a frozen model rather than retraining all weights.

Model Serving — The infrastructure layer responsible for deploying trained models and efficiently handling real-time prediction requests at scale.

NPU (Neural Processing Unit) — A processor integrated into consumer devices (phones, laptops) optimized for on-device AI inference tasks.

PEFT (Parameter-Efficient Fine-Tuning) — A family of techniques (including LoRA) for adapting large models with minimal compute and data.

Prompt Caching — An optimization where common prompt prefixes are cached server-side so repeated calls don't recompute them.

Pruning — Removing redundant or low-importance weights from a model to make it smaller and faster.

Quantization — Reducing the numerical precision of model weights (e.g., from 32-bit to 4-bit) to shrink model size and speedup inference.

Throughput — The number of tokens or requests a model can process per unit of time.

TPU (Tensor Processing Unit) — Google's custom AI accelerator chip, designed specifically for matrix operations in deep learning workloads.

10. Evaluation, Benchmarks & FailureModes

Adversarial Example — An input crafted with subtleperturbations specifically designed to fool a model into makingan incorrect prediction.

Benchmark — A standardized test or dataset used to measureand compare AI model performance (e.g., MMLU, HumanEval).

Benchmark Contamination — When training data includesexamples from evaluation benchmarks, artificially inflatingreported scores.

BLEU Score — A metric for evaluating machine translationquality by comparing outputs to human reference translations.

Catastrophic Forgetting — The tendency of a neural network tolose previously learned information when trained on new data.

Distribution Shift — When the statistical properties of real-world inputs differ from the training data, degrading modelperformance.

Eval (Evaluation Suite) — A structured set of tests used toassess a model's capabilities across specific tasks or safetydimensions.

F1 Score — A classification metric that balances precision(accuracy of positive predictions) and recall (coverage of truepositives).

Human-in-the-Loop (HITL) — A design pattern where humansare included at critical decision points in an otherwiseautomated AI pipeline.

Perplexity — A metric measuring how well a language modelpredicts a test dataset; lower perplexity means better prediction.

Prompt Injection — An attack where malicious instructions areembedded in external content (e.g., a webpage) to hijack an AIagent's behavior.

Red-Teaming — Deliberately attacking or adversarially probingan AI system to discover its failure modes, biases, or safetyvulnerabilities.

ROUGE — A set of metrics measuring overlap betweengenerated text summaries and reference summaries.

11. AI Ethics, Safety & Governance

AI Act — The European Union's landmark regulatoryframework, in force from 2024, that classifies AI systems by risklevel and imposes corresponding compliance requirements.

AI Safety — The field of research focused on preventing AIsystems from causing unintended harm.

Alignment — The challenge of ensuring AI systems pursue goalsand behave in ways that are consistent with human values andintentions.

Bias — Systematic errors or unfair skews in a model's outputsthat stem from biased training data or objectives.

Constitutional AI Principles — A defined set of values or rulesgiven to an AI system against which it self-evaluates its ownoutputs during training or inference.

Differential Privacy — A mathematical framework for trainingmodels in a way that guarantees individual data points cannotbe reverse-engineered from model outputs.

Explainability (XAI) — The degree to which a model's decisionsand reasoning can be understood by humans.

Fairness — The principle that an AI system should treat allgroups equitably and not produce discriminatory outcomes.

Federated Learning — A distributed training approach wheremodels are trained locally on user devices without raw dataever leaving those devices, preserving privacy.

Guardrails — Constraints and filters built into an AI system to prevent it from producing harmful, offensive, or off-policy outputs.

Interpretability — The ability to understand the internal mechanisms of a model — what features it uses and why.

Jailbreaking — Techniques used to bypass an AI model's safety restrictions and elicit prohibited outputs.

Model Card — A documentation artifact accompanying areleased AI model that describes its intended use, limitations,training data, and performance metrics.

Responsible AI — A set of principles and practices (fairness, transparency, accountability, safety) guiding ethical AI development and deployment.

Watermarking — Embedding hidden signals in AI-generated content (text or images) to allow later detection of machine-generated material.

12. Emerging Concepts & Future Directions

AGI (Artificial General Intelligence) — A hypothetical AIsystem with human-like cognitive flexibility, capable ofperforming any intellectual task a human can do.

AI Slop — Low-quality, generic, or inaccurate AI-generatedcontent produced at scale with little human review or editorialoversight.

ASI (Artificial Superintelligence) — A theoretical AI thatsurpasses human intelligence across all domains; as of 2025 thismoved from science fiction to corporate mission statements atcompanies like Meta and Microsoft.

Compute Budget — The total computational resources(measured in FLOPs or GPU-hours) allocated to training a model,a key constraint in AI development.

FLOP (Floating Point Operation) — A basic unit for measuringcomputational workload; model training costs are oftendescribed in total FLOPs required.

GEO (Generative Engine Optimization) — The practice ofoptimizing content so it appears in and is cited by AI-generatedsearch summaries, analogous to SEO for traditional search.

Hyperscaler — The largest cloud infrastructure providers (AWS,Google Cloud, Azure, etc.) that supply the compute power fortraining and serving frontier AI models.

Metacognition — An AI system's capacity to reflect on andevaluate its own reasoning or knowledge limits, sometimescalled "knowing what you don't know."

Vibe Coding — A 2025 term for using natural language promptsto generate entire software projects with minimal manual code-writing.

Additional Philosophical & Ethical Concepts

Artificial Narrow Intelligence (ANI) — The current state of AI: systems that excel at one specific task but cannot generalize beyond it.

Autonomy Gradient — The spectrum from fully human-controlled to fully AI-autonomous systems; every point on thisgradient carries different ethical and legal implications foraccountability.

Bayesian Inference — A probabilistic framework where beliefsare updated based on new evidence; foundational to rationalistepistemology and AI reasoning systems.

Chinese Room Argument — John Searle's thought experimentarguing that a system manipulating symbols according to rules(like an LLM) does not thereby "understand" anything — a directphilosophical challenge to claims of AI cognition.

Computer Vision (CV) — The AI field focused on enablingmachines to interpret and understand visual information fromimages and video.

Emergent Intelligence — Intelligence or properties that arisefrom the complexity of a system without being explicitlyprogrammed; raises questions about reduction andsupervenience.

Epistemic Calibration — The degree to which a model's statedconfidence matches its actual accuracy; a virtue epistemologyconcept applied to machine outputs.

Existential Risk (X-Risk) — The possibility that advanced AIcould pose a civilizational or extinction-level threat; connects tolongtermism and the moral weight of future persons.

Functionalism — The philosophical view that mental states aredefined by their functional roles rather than their physicalsubstrate, providing the primary philosophical justification forthe possibility of machine minds.

Instrumental Convergence — The philosophical thesis thatalmost any sufficiently capable AI will converge on sub-goalslike self-preservation and resource acquisition, regardless ofterminal goals.

Knowledge Representation — How facts about the world areencoded inside an AI system; touches on philosophy of languageand propositional vs. non-propositional knowledge.

Moral Agency — The capacity to act based on moral reasoningand bear responsibility; most philosophers currently deny thisapplies to AI.

Moral Patienthood — The philosophical question of whether anAI system can be an object of moral concern — something towhich we owe duties.

Natural Language Processing (NLP) — The branch of AIfocused on enabling machines to understand, interpret, andgenerate human language.

Orthogonality Thesis — Nick Bostrom's claim that intelligenceand goals are independent dimensions — a superintelligent AIneed not be benevolent simply by virtue of being intelligent.

Phenomenology of AI — The study of whether AI systems haveanything analogous to first-person experience — perception,intentionality, or a "point of view."

Recommendation Engine — An AI system that analyzes userbehavior to suggest relevant content, products, or services.

Sentiment Analysis — An NLP technique that classifies theemotional tone (positive, negative, neutral) of text.

Sentience — The capacity to have subjective sensory oremotional experiences; distinct from intelligence, and the keycriterion in most ethical frameworks for moral consideration.

Turing Test — Alan Turing's behavioral criterion for machine intelligence: if a machine's conversational output is indistinguishable from a human's, it should be considered intelligent.

Value Alignment Problem — How to specify human values precisely enough to encode them in AI, given that humans disagree, are inconsistent, and often can't articulate their own values.