🐇 Rabbithole ← All lessons Work with us →
Tokens · Day 9 of 30 01 / 05
Learn by clicking · Day 9 of 30 · ~3 min

Tokens, context & cost: why your AI bill works the way it does

An AI model doesn't read words the way you do. It reads tokens: small chunks of text. Think of a token as a syllable-sized Lego brick. The model snaps text apart into bricks, then works with the bricks. Here's the same sentence, as the model sees it:

"Tokens are how AI reads." → split into tokens
Tokens·are·how·AI·reads.
🧱 A token = a chunk of text (often part of a word) 📏 English rule of thumb: ~4 characters ≈ 1 token 📄 And ~750 words ≈ ~1,000 tokens 💵 You're billed per token, in and out

That last point surprises people. Models are priced per token: both the tokens you send in (your message, plus any documents) and the tokens the model generates back (its answer). Not per message. Not per chat. Per chunk of text, both directions.

02 / 05 · The myth

"The AI remembers everything we've ever said."

In a long chat it feels true, so it's easy to believe. It's also the single most expensive misunderstanding about how these tools work. Let's kill it.

⚠ Myth

"It remembers everything, and it charges me per message."

The truth: a model has no built-in long-term memory of your conversation, and it isn't billed per message. It has a context window: the maximum number of tokens it can hold in mind at once (your input and its output, combined). Think of it as the size of its desk, or its short-term memory.

The desk fills up, and old notes fall off

A conversation is a stack of notes on that desk. When the chat grows past the window, the oldest notes fall off the edge to make room. That's why a long chat seems to "forget" how it started: it was never permanent memory.

The key: the window is a limit, not a memory. As of writing, context windows range from a few thousand tokens to a million-plus, depending on the model, but every model has an edge. To give an AI durable knowledge that doesn't fall off the desk, you store it outside the chat and fetch only what's needed. That's what retrieval (RAG) does.
03 / 05 · Watch the cost climb

Why a long chat gets expensive.

Here's the part almost everyone misses: each turn of a chat usually re-sends the whole conversation so far as input. The model has no memory, remember, so to keep context the app pastes the entire history back in every single message. A long thread re-bills its own history, over and over.

🔒 Runs entirely in your browser. Nothing is sent anywhere

Token & cost estimator

Type or paste below. Counts and costs update live; all math happens on this page.

Short question A paragraph A 2-page document
Characters
0
what you typed
Est. tokens
0
estimate · chars ÷ 4
Est. input cost
$0.0000
illustrative only
Now imagine this text is one turn in a chat. Drag to grow the conversation:
Total billed so far (history keeps getting re-sent): $0.0000

See the curve bend upward? Each turn pays for everything before it again. That's why a 30-message thread can cost far more than 30 fresh, short questions.

Illustrative only. Token math here is approximate (a simple characters ÷ 4 estimate; real tokenizers differ), and the example prices are made up for teaching. Output tokens often cost more than input, and prices change constantly. Always check current pricing for your actual model.
04 / 05 · Spend less, get more

Four levers that cut your AI bill.

Once you see tokens, the savings are obvious. None of these make the AI dumber for the task; they just stop you paying for tokens you never needed.

  • 1. Right-size the model. Vendors offer tiers: small & fast vs large & smart (names change, so we won't pin them as of writing). Use the cheap, fast one for simple tasks; reach for the powerful one only when the task truly needs it.
  • 2. Trim the context. Don't paste a whole 50-page PDF when 2 pages matter. Sending less input is the most direct saving there is, and fetching only the relevant slice is exactly what RAG does for you automatically.
  • 3. Summarize long histories. Instead of re-sending a raw 40-message thread every turn, compress it to a short summary. The model keeps the gist; you stop re-billing the full transcript.
  • 4. Cap the output. Output tokens often cost more than input. If you don't need an essay, set a max length: short answers are cheaper answers. Tighter prompts help here too.
05 / 05 · Done

You now understand AI cost better than most people who use it daily.

You know what a token is, why models charge per token in and out, what a context window is, why long chats forget and get pricey, and the four levers that bring the bill down.

The difference between an AI feature that's a delight and one that quietly drains your budget is almost always token discipline: the right model, trimmed context, and capped output, wired in from day one.

See these ideas doing real work: RAG trims context cheaply, and tighter prompts cost less per call. Or browse the full 30-day track.

Built by rabbithole.consulting. Custom-built infrastructure that runs your business. This lesson runs entirely in your browser · Free under MIT.