Introduction to the Mechanics of Generative AI: Roughly Understanding LLMs, Transformers, and Tokens to Use Them Effectively

AI Navigate Original / 3/17/2026

💬 OpinionIdeas & Deep Analysis

共有:

Key Points

LLMs are best understood conceptually as repeatedly predicting the next token to generate text.
Tokens are finer units than words and directly affect pricing, context length, and prompt design.
Transformer excels at large-scale learning because its Attention mechanism references important parts of the context and enables parallel computation.
Hallucinations arise from the tendency to generate plausible continuations; they can be mitigated with citation requests and RAG.
By focusing on three aspects - roles, constraints, and examples - the quality of practical outputs becomes more stable.

What exactly is Generative AI doing?

Generative AI is a broad term for technologies that can generate new content across text, images, audio, code, and more in a plausible way. In particular, the central role in text generation is LLM (Large Language Model). Services like ChatGPT produce text by using an LLM to predict the next expected word (more precisely, the next token).

The key is not that it understands meaning, but that it statistically predicts the most natural continuation with high accuracy. However, as scale increases, more abstract patterns are learned, so the result tends to behave as if it understands.

LLM basics: Two phases of learning and inference

1) Training: Learning patterns from large amounts of text

LLMs learn from vast data including internet documents, books, papers, and code to make sentences likely to continue. Typically, they repeatedly perform the next token prediction and adjust model weights to reduce mistakes.

Rough example: The weather tomorrow is ... Next would likely be sunny, rain, or cloudy. They learn probabilities from context, frequency, and surrounding relationships.

2) Inference: Generating the next token for a given input

When a user provides a prompt, the model internally constructs a probability distribution and selects the next token. This process repeats to extend the text. Common generation parameters include temperature and top-p.

temperature: lower values yield more deterministic outputs (safer); higher values yield more diverse outputs (more varied)
top-p (nucleus sampling): choose from the smallest set of tokens whose cumulative probability reaches p

What is a token? The finer unit of text than a word

The smallest unit typically handled by LLMs is the token. Tokens are not necessarily whole words; they can be parts of words or symbols, and in Japanese, sequences of characters or subwords. LLMs first break sentences into a token sequence and base their predictions on that sequence.

Why tokens matter

Cost: API pricing is usually token-based (input + output)

Sign up to read the full article

Create a free account to access the full content of our original articles.

The programming passion is melting

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

Reddit r/LocalLLaMA

Qwen3.5 Knowledge density and performance

Reddit r/LocalLLaMA

AIは人間に近づいているのか？エンジニア視点で考えてみた話

Qiita