Artificial intelligence (AI) is transforming our world, and in doing so, it’s creating its own language to describe these changes. Spend five minutes reading about AI, and you’ll encounter LLMs, RAG, RLHF, and a dozen other terms that can make even the most tech-savvy individuals feel insecure. This glossary aims to help fix that. We update it regularly as the field evolves, so consider this a living document, much like the AI systems themselves.
AGI
Artificial general intelligence (AGI) is an abstract term with varying definitions. Generally, AGI refers to AI that’s more capable than the average human at many tasks. OpenAI CEO Sam Altman once described AGI as “the equivalent of a median human you could hire as a co-worker.” Meanwhile, Google DeepMind views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? So are experts in AI research.
AI agent
An AI agent is an intelligent tool that uses AI technologies to perform a series of tasks on your behalf, beyond what a basic chatbot can do. For example, it might be used for filing expenses, booking restaurant tables, or even writing and maintaining code. As we’ve explained before, the concept of an AI agent can vary depending on how its capabilities are envisioned. Infrastructure is still being built to deliver these advanced capabilities, but the basic idea implies an autonomous system that may use multiple AI systems to carry out complex tasks.
API endpoints
Think of API endpoints as “buttons” in a piece of software that other programs can press to make it do things. Developers leverage these interfaces to build integrations, such as allowing one application to pull data from another or enabling an AI agent to control third-party services directly without manual intervention. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see them. As AI agents become more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful — and sometimes unexpected — possibilities for automation.
Chain of thought
For a simple question like “Which animal is taller: a giraffe or a cat?” a human brain can answer without much thinking. But in many cases, you might need to write down an equation using pen and paper because there are intermediate steps involved. For AI agents, chain-of-thought reasoning means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer but is more likely to be correct, especially in logic or coding contexts.
Coding agents
This concept is more specific than an “AI agent,” referring to a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Unlike traditional AI chatbots, which merely suggest code for humans to review and paste in, coding agents can write, test, and debug code autonomously, handling the iterative work typically done by developers daily. These agents operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight.
Compute
Although somewhat ambiguous, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often used as a shorthand for the kinds of hardware that provide this computational power — things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.
Deep learning
A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered artificial neural network structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.
Deep learning AI models can identify important characteristics in data themselves, rather than requiring human engineers to define these features. This structure also supports algorithms that can learn from errors and improve their outputs through repetition and adjustment. However, deep learning systems require a large amount of data points (millions or more) to yield good results; they are typically slower to train compared to simpler machine learning algorithms — so development costs tend to be higher.
Diffusion
Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data (e.g., photos, songs) by adding noise until nothing remains. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to its cube form. However, diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore destroyed data, thereby gaining the ability to recover data from noise.
Distillation
Distillation is a technique used to extract knowledge from a large AI model with a “teacher-student” model. Developers send requests to a teacher model and record the outputs; answers are sometimes compared against a dataset to assess their accuracy. These outputs are then used to train a student model, which approximates the teacher’s behavior.
Distillation can be used to create smaller, more efficient models based on larger ones with minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, it may have also been used by some companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service for AI APIs and chat assistants.
Fine-tuning
This refers to further training an AI model to optimize performance for a more specific task or area than was previously its main focus — typically by feeding in new, specialized (i.e., task-oriented) data. Many AI startups are starting with large language models as a foundation and then fine-tuning them based on their domain-specific knowledge and expertise to enhance utility for target sectors or tasks.
GAN
A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins important developments in generative AI, including deepfake tools. GANs involve the use of two neural networks: one draws on its training data to generate an output that is passed to the other model for evaluation.
The two models are programmed to try and outdo each other. The generator tries to get its output past the discriminator, while the discriminator works to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without requiring additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), they are not suitable for general-purpose AI.
Hallucination
Hallucination is the preferred term in the AI industry for AI models generating information that is incorrect — literally making stuff up. Obviously, this is a huge problem for AI quality. Hallucinations can produce misleading outputs and even lead to real-life risks (think of health queries returning harmful medical advice). The problem arises from gaps in training data; hallucinations are contributing to the push toward increasingly specialized and/or vertical AI models — i.e., domain-specific AIs that require narrower expertise — as a way to reduce knowledge gaps and shrink disinformation risks.
Inference
Inference is the process of running an AI model. It involves setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data.
Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.
Large language model (LLM)
A Large Language Model (LLM) is the AI model used by popular AI assistants like ChatGPT, Claude, Google’s Gemini, Meta’s Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.
LLMs are deep neural networks made up of billions of numerical parameters (or weights) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words. These models are created from encoding patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, it generates the most likely pattern that fits your prompt.
Memory cache
Memory cache refers to an important process that boosts inference (the process by which AI works to generate a response to a user’s query). In essence, caching is an optimization technique designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations and need to optimize performance for real-world applications.
Model
A model in the context of AI refers to any structure or framework that can learn from data, making predictions based on it. Models are trained using large datasets, and their parameters (or weights) are adjusted during training to minimize errors (loss). The more complex a model is, the more parameters it has.
Neural network
A neural network is an AI structure inspired by how neurons in the human brain work. It consists of layers of interconnected nodes that process information and make decisions based on input data. The most common types include feedforward, recurrent, convolutional, and transformer networks.
Optimization
In machine learning, optimization is about finding the best parameters for a model to minimize error (loss). It involves adjusting these parameters during training to make predictions more accurate. Optimization techniques like gradient descent are used in this process.
Prompt
A prompt is an input given to an AI system, which then uses its learned patterns and knowledge to generate a response. Prompts can be text or structured data (e.g., JSON). They play a crucial role in guiding the AI’s output.
Training
The process of training an AI model involves feeding it large datasets so that it learns patterns and relationships within the data, which it then uses to make predictions. During this process, parameters are adjusted to minimize error (loss). Training is a critical step in building and improving AI models.
Unsupervised learning
Unsupervised learning involves training an AI model on unlabeled data without any explicit guidance or labels. The model learns patterns and relationships within the data by itself, making it useful for tasks like clustering and anomaly detection.
Variational autoencoder (VAE)
A V
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




