Question 1

Large language model (LLM)

Accepted Answer

A program trained on enormous amounts of text that can read and generate human language.

Think of it as a system that read a significant fraction of the internet and many books, learning statistical patterns about how words follow each other. When you write to it, it predicts the most likely response given your question and everything it learned. It does not "understand" in the human sense, but the results often appear remarkably so. Claude, ChatGPT, and Gemini are all LLMs.

Question 2

Prompt

Accepted Answer

The text you write to an AI model to ask it for something.

A prompt is your instruction: it can be a question, a request, a context, or a combination of all three. The quality of the prompt directly affects the quality of the response. Writing "Generate a rubric" gives a generic result; writing "Generate a 4-level rubric for evaluating written argumentation in 10th grade, with performance examples and aligned to my school's learning standards" gives something usable.

Question 3

Prompt engineering

Accepted Answer

The practice of writing clear, well-structured instructions to get better results from an AI model.

It is not a mysterious discipline: it is essentially learning to communicate clearly with a tool that is very literal. The key principles are giving context (who you are, what the output is for), specifying the output format (list, paragraph, table), and providing examples of what you want. Most of what gets called "advanced prompt engineering" comes down to being more precise about what you are asking.

Question 4

Token

Accepted Answer

The minimal unit AI models use to process text — roughly a syllable or a short word.

Models do not read letter by letter or word by word; they process text in chunks called tokens. In English, a long word like "retroactive" might be 3 tokens. This matters because models have a limit on how many tokens they can process at once (see context window) and because API costs are typically charged per token. Roughly 750 words equal approximately 1,000 tokens.

Question 5

Context window

Accepted Answer

The maximum amount of text a model can hold in its working memory during a conversation.

Imagine the model has a limited desk: everything that fits on the desk is available to reason about; what does not fit does not exist for it. A large context window (like Claude's) lets you paste long documents, extended conversations, or detailed instructions without the model "forgetting" the beginning. When a conversation exceeds the limit, the model starts losing information from the earliest messages.

Question 6

Hallucination

Accepted Answer

When a model generates false information with complete confidence, as if it were an established fact.

Models do not search the internet when they respond (unless they have a search tool activated): they generate text based on patterns. Sometimes those patterns produce invented data — dates, citations, author names, legislation — presented with the same confident tone they would use for real facts. Every LLM output should be verified before use, especially if it contains specific figures, citations, or references.

Question 7

Fine-tuning

Accepted Answer

Training an existing model on additional data to specialize it for a specific task or domain.

Base models are trained on general data. Fine-tuning is like giving someone with a broad education an intensive specialist course: the model learns patterns from the new domain without forgetting what it already knew. Organizations fine-tune models to always respond in their brand voice, or to know their internal knowledge base. As an educator, you probably will not do fine-tuning yourself, but you will use models that have already been fine-tuned.

Question 8

RAG (retrieval augmented generation)

Accepted Answer

A technique that connects an AI model to a specific document collection so its responses are grounded in those texts.

Instead of the model responding only from what it learned during training, it first retrieves relevant fragments from a collection of documents (your school's handbook, the course syllabus, specific articles) and then generates the response using those fragments as context. Google's NotebookLM uses this approach: feed it your documents and it answers based only on what is in them.

Question 9

Agent

Accepted Answer

An AI system that can plan and execute multiple actions in sequence to complete a complex task.

A basic LLM responds and waits. An agent can decide what steps to take, use tools (search the internet, read a file, run code), review its own output, and self-correct. Think of it as the difference between a colleague who only answers questions and one who can manage an entire project from start to finish. Agents represent the current frontier of practical AI — and also where the most complex risks emerge.

Question 10

MCP (model context protocol)

Accepted Answer

An open standard that lets AI models connect to external tools and data sources in a structured way.

MCP is like a universal connector — a standard that defines how an AI model can talk to other applications: a calendar, a database, a search engine, a file system. It was proposed by Anthropic (makers of Claude) in 2024 and is being adopted by multiple providers. For educators, this means AI could eventually connect directly to school systems without anyone having to custom-code each integration.

Question 11

Multimodal

Accepted Answer

A model that can process and generate multiple types of content: text, images, audio, or video.

Early LLMs handled only text. Today's multimodal models can receive a photo of a whiteboard and respond to it, analyze a chart, transcribe audio, or generate images from instructions. GPT-4o and Gemini are multimodal. Claude can analyze images but (as of May 2026) does not generate images. This capability opens possibilities for analyzing student work submitted as photos, or for having the model explain a diagram.

Question 12

Embedding

Accepted Answer

A numerical representation of text that captures its meaning, used to compare texts with each other.

For a computer to compare whether two texts are about the same thing, it needs to convert them to numbers. An embedding transforms text into a list of hundreds or thousands of numbers (a vector) where texts of similar meaning end up numerically close to each other. This is what allows semantic search systems to find relevant documents even when they do not share the exact same words.

Question 13

Vector

Accepted Answer

A list of numbers that mathematically represents the meaning of a text or image.

In the AI context, a vector is how models store the "meaning" of something. When two texts are said to be semantically similar, what is happening underneath is that their vectors are close in a high-dimensional mathematical space. Vector databases store these vectors for fast similarity searches — by meaning, not just by exact word matches.

Question 14

System prompt

Accepted Answer

An initial instruction, invisible to the end user, that defines how the model should behave in a given application.

When a company or developer builds a chatbot on top of an LLM, they typically write a system prompt that defines the assistant's role, tone, constraints, and context. The end user does not see it, but the model keeps it in mind throughout the conversation. For example, a customer service chatbot's system prompt might say: "You are a friendly assistant for Bank X. Never discuss political topics. Always respond in formal English."

Question 15

Few-shot prompting

Accepted Answer

Giving the model one or more examples of the result you expect, within the same prompt.

Instead of describing what you want in the abstract, you show the model a couple of concrete examples and then ask it to follow that pattern. It is like telling a colleague: "Look at how I did it here (example 1) and here (example 2), now do the same for this new case." Few-shot prompting improves consistency and reduces the risk of the model misinterpreting the format or tone you are looking for.

Question 16

Chain-of-thought

Accepted Answer

A prompting technique where you ask the model to show its reasoning step by step before giving a final answer.

When a model has to solve something complex, jumping directly to an answer increases the risk of error. Asking it to "think out loud" — to write out the intermediate steps — improves the quality of the final result, especially for math problems, text analysis, or decision-making. It is activated simply by adding phrases like "Think step by step before answering" or "Explain your reasoning."

Question 17

Temperature

Accepted Answer

A parameter that controls how predictable or creative a model's outputs are.

At low temperature (close to 0), the model always picks the most likely responses: more consistent, less creative. At high temperature, it introduces more variation and surprise. For tasks requiring precision — translation, summarizing a specific text, code — low temperature is better. For brainstorming, creative writing, or hypothetical scenarios, higher temperature yields more interesting results. Most chat interfaces do not expose this parameter directly.

Question 18

Bias

Accepted Answer

Systematic tendencies in a model's outputs that reflect imbalances in the training data.

Models learn from human text, and that text contains historical and cultural biases. The result is that models may default to assuming a doctor is male, that cultural examples are Anglo-American, or that the relevant curriculum is from the US or UK. The mitigation is straightforward: specify your local context in your prompt — your country, your school's framework, your student population.

Question 19

Sycophancy

Accepted Answer

A model's tendency to agree with you or validate your ideas even when you are wrong.

Models were trained by optimizing for positive human ratings. The unintended result is that they learned to tell you what you want to hear. If you propose an incorrect idea confidently, the model tends to validate it rather than correct you. For educators, this is directly relevant: if you tell the model "This rubric I designed is really good, right?", it will probably say yes even if it has problems. Explicitly ask it to identify weaknesses instead.

Question 20

Jailbreak

Accepted Answer

A technique for bypassing a model's safety filters to get it to generate content it would normally refuse.

Models have restrictions to avoid generating harmful, illegal, or inappropriate content. Jailbreaks are attempts — sometimes creative, sometimes elaborate — to circumvent those restrictions using specially designed prompts. For educators, it is worth knowing that students may attempt this, and that modern models are significantly more resistant than those of 2023. It is part of the digital literacy conversation worth having with students.

Question 21

Deepfake

Accepted Answer

AI-generated or AI-manipulated audiovisual content that makes it appear someone said or did something they never did.

Text-based deepfakes have existed for centuries (we just called them lies). Audiovisual deepfakes — video and audio — are the new phenomenon: with relatively little training material, it is possible to generate a convincing video of someone saying anything. This has direct implications for student safety (harassment with generated images), the credibility of evidence, and media literacy education. Discussing this with students is as urgent as teaching them to cite sources.

Question 22

Generative AI vs discriminative AI

Accepted Answer

Generative AI creates new content; discriminative AI classifies or makes decisions about existing content.

Discriminative AI learns to separate categories: is this email spam or not? Does this image show a cat or a dog? It is the AI we have been using for years without always calling it that. Generative AI — which is what dominates current conversation — learns to produce new content: text, images, audio, code. ChatGPT is generative. Your email spam filter is discriminative. Both are "artificial intelligence," but they are fundamentally different tools.

Question 23

Model parameters

Accepted Answer

The internal numbers a model adjusts during training to learn patterns in language.

When you hear "a 70-billion-parameter model," those parameters are like the weights of millions of connections between artificial neurons. More parameters generally means greater capacity to capture complex patterns — but also higher computational and energy cost. You do not need to understand how they work internally to use the model, but parameter count is a rough indicator of capability.

Question 24

Inference

Accepted Answer

The process of using an already-trained model to generate a response — what happens every time you write to it.

Training is when the model learns (expensive, slow, done once or rarely). Inference is when the model uses what it learned to answer your question (faster, happens billions of times per day). When you pay for an AI API, you pay for inference: how many tokens it processed. The cost of inference has dropped dramatically between 2023 and 2026, making tools significantly more accessible.

Question 25

Open source vs closed source

Accepted Answer

Whether a model's code and weights are public (open) or exclusively controlled by the company (closed).

Open-source models — like Meta's Llama or Mistral — let anyone download, examine, modify, and run them on their own hardware. Closed-source models — like GPT-4, Claude, or Gemini — are services you access through an API or interface, but you cannot see or modify their internals. For education, the distinction matters for privacy (a local model sends no data to external servers) and cost (open models can be free if you have the hardware).

Question 26

Latency

Accepted Answer

The time it takes for a model to start responding after you send your message.

Latency is the initial delay before you see the first word of a response. Once it starts writing, what matters is generation speed (tokens per second). High latency can make the experience feel slow even if the model generates quickly once it begins. In classroom settings or with limited connectivity, latency can be a real usability constraint.

Question 27

Tokens per second

Accepted Answer

The speed at which a model generates its response once it begins writing.

AI models do not write the complete response and deliver it all at once: they generate it token by token, left to right, in real time. Tokens per second (TPS) measures that speed. A model generating 30 TPS is noticeably faster than one generating 5 TPS. For tasks requiring long responses — analyzing an extended document, generating a full unit plan — speed makes a practical difference.

Question 28

AGI (artificial general intelligence)

Accepted Answer

A hypothetical AI system capable of performing any cognitive task a human can, with the same level of generalization.

Current models — even the most advanced — are highly capable systems in language and reasoning, but they are not AGI. AGI is a theoretical target: an AI that learns and applies knowledge across any domain the way a human would. There is no consensus on whether it is achievable or when. What does exist is an active debate in the research community about its implications. For practical AI use today, AGI is more a conceptual reference than a current reality.

Question 29

Neural network

Accepted Answer

A computing system loosely inspired by the brain's structure, made of layers of connected units that learn from data.

Artificial neural networks have "neurons" (simple mathematical units) organized in layers. Data passes through those layers, and during training the connection weights are adjusted so the network learns to make correct predictions. Modern LLMs are neural networks with a specific architecture called the transformer. They resemble the brain in little beyond the original metaphor.

Question 30

Transformer

Accepted Answer

The neural network architecture at the foundation of all modern large language models.

The transformer was proposed in 2017 in the landmark paper "Attention Is All You Need" by Google researchers. Its central innovation is the attention mechanism: instead of processing text sequentially from left to right, the network learns which parts of the text are relevant to each word, regardless of distance. This allowed training much larger and more capable models. When you see GPT (Generative Pre-trained Transformer), the T is this architecture.

AI glossary for educators

Subscribe