AI Navigate

LLM Development Basics: Tokens, Context, Pricing

OpenAI: tiktoken library, web Tokenizer tool
Anthropic: count_tokens API (roughly like English)

AI Navigate Original / 5/16/2026

共有:

Key Points

Pin down tokens, context, pricing before LLM dev; Japanese uses ~1.5-2x English tokens
Context = input+output; ~1M-class in 2026 but volatile—check official; long-text middle drops, use RAG/splitting
Pricing is volatile so no price table—tendencies only: output costs more, light models far cheaper, caching/Batch discount
In production also handle Rate Limit, Latency, Streaming, Tool Use/structured output

Understanding tokens, context, and pricing before using LLM APIs makes both code and cost estimation smooth.

The smallest unit an LLM processes—not "words" but word fragments and symbols too. Pricing is billed by token count.

English: 1 word ≈ 1.3 tokens (≈ 4 chars)
Japanese: 1 char ≈ 1-2 tokens (varies by kana/kanji); ~1.5-2x English for the same meaning
Code: symbols/whitespace are all tokens
Images/audio/video are also counted as tokens on a separate budget (varies by API)

Create a free account to access the full content of our original articles.