The only AI glossary you’ll need this year

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro July 3, 2026 8 min read
The only AI glossary you’ll need this year


Artificial intelligence is inventing a new vocabulary to describe how it works, filling product meetings and podcasts with acronyms like LLMs, RAG, and RLHF that can make even seasoned technologists feel uneasy.

This glossary defines the terms you are most likely to encounter, whether you are building with these systems, investing in them, or simply trying to follow the news. We update this regularly as the field changes, so treat it as a living document rather than a static reference.

AGI

Artificial general intelligence, or AGI, remains a vague concept. It generally describes AI capable of performing many, if not most, tasks better than the average human. OpenAI CEO Sam Altman once defined AGI as the equivalent of a median human you could hire as a co-worker. OpenAI’s charter describes it as highly autonomous systems that outperform humans at most economically valuable work. Google DeepMind views it slightly differently, defining AGI as AI at least as capable as humans at most cognitive tasks. Experts at the forefront of research are often just as confused as the rest of us.

AI agent

An AI agent is a tool that uses AI technologies to perform a series of tasks on your behalf. It can do more than a basic chatbot, such as filing expenses, booking tickets, or writing and maintaining code. This emergent space has many moving parts, so the term might mean different things to different people. Infrastructure is still being built to deliver the envisaged capabilities. The basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks.

API endpoints

Think of API endpoints as buttons on the back of software that other programs can press to make it do things. Developers use these interfaces to build integrations, such as allowing one application to pull data from another, or enabling an AI agent to control third-party services directly without a human manually operating each interface. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see them. As AI agents grow more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful possibilities for automation.

Chain of thought

Given a simple question, a human brain can answer without thinking too much about it, such as which animal is taller, a giraffe or a cat. In many cases, you need pen and paper to come up with the right answer because there are intermediary steps. If a farmer has chickens and cows with 40 heads and 120 legs, you might need to write down a simple equation to find the answer of 20 chickens and 20 cows.

In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.

(See: Large language model)

Coding agents

This is a more specific concept than an AI agent, which means a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Rather than simply suggesting code for a human to review and paste in, a coding agent can write, test, and debug code autonomously, handling the kind of iterative, trial-and-error work that typically consumes a developer’s day. These agents can operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight. Think of it like hiring a very fast intern who never sleeps and never loses focus, though a human still needs to review the work.

Compute

Although somewhat of a multivalent term, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often a shorthand for the kinds of hardware that provides the computational power, such as GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.

Deep learning

A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.

Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results, millions or more. They also typically take longer to train compared to simpler machine learning algorithms, so development costs tend to be higher.

(See: Neural network)

Diffusion

Diffusion is the tech at the heart of many art, music, and text-generating AI models. Inspired by physics, diffusion systems slowly destroy the structure of data, such as photos and songs, by adding noise until there is nothing left. In physics, diffusion is spontaneous and irreversible, such as sugar diffused in coffee which cannot be restored to cube form. But diffusion systems in AI aim to learn a sort of reverse diffusion process to restore the destroyed data, gaining the ability to recover the data from noise.

Distillation

Distillation is a technique used to extract knowledge from a large AI model with a teacher-student model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior.

Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4.

While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants.

Fine-tuning

This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training, typically by feeding in new, specialized data.

Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.

(See: Large language model)

GAN

A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data, including but not only deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate.

The two models are essentially programmed to try to outdo each other. The generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications, such as producing realistic photos or videos, rather than general purpose AI.

Hallucination

Hallucination is the AI industry’s preferred term for AI models making stuff up, literally generating information that is incorrect. Obviously, it is a huge problem for AI quality.

Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks, with potentially dangerous consequences, such as a health query that returns harmful medical advice.

The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. Hallucinations are contributing to a push toward increasingly specialized and vertical AI models, i.e. domain-specific AIs that require narrower expertise, as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.

Inference

Inference is the process of running an AI model. It is setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference cannot happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data.

Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.

(See: Training)

Large language model (LLM)

Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.

LLMs are deep neural networks made of billions of numerical parameters, or weights, that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words.

These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt.

(See: Neural network)

Memory cache

Memory cache refers to an important process that boosts inference, which is the process by which AI works to generate a response to a user’s query. In essence, caching is an optimization technique, designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations and every time those calculations are made, they use up more power. Caching is designed to cut down on the number of calculations a model might have to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known is KV, or key value, caching. KV caching works in transformer-based models, and increases efficiency, driving faster results by reducing the amount of time and power required.

What it means

For people building things, this vocabulary is not just academic. It describes the actual tools available. Coding agents can write and fix code without constant human oversight, while diffusion models allow for the creation of new art and audio by reversing noise. However, users must remember that hallucinations remain a risk, and the power to run these models depends entirely on the underlying compute infrastructure.Source Read original →

Scroll to Top