Pre-Computed Context: How Ditto Keeps Long Conversations Fast and Accurate

You’re 45 messages into a coding session. You’ve walked the AI through your architecture, debugged two issues, and settled on a migration strategy. Then you ask a follow-up question and the response takes 8 seconds. It contradicts something you agreed on 20 messages ago. And your token bill just doubled.

This is the context window problem, and every AI user hits it eventually. The longer your conversation, the more context the model has to process, and the slower, dumber, and more expensive it gets.

Ditto v0.26.0 fixes this with pre-computed memory summaries.

Why Long Conversations Break

Every AI model has a context window, a limit on how much text it can consider at once. When you chat with ChatGPT or Claude, the system typically sends your entire conversation history with every message. Conversation 3 messages long? Cheap and fast. Conversation 50 messages long? The model is re-processing tens of thousands of tokens every single turn.

This creates three compounding problems:

Speed degrades. More input tokens = longer time to first response.
Accuracy drops. Models struggle to prioritize relevant context when drowning in conversation history. Important decisions from message #5 get lost in the noise of message #40.
Cost scales linearly. You’re paying for the model to re-read your entire history on every turn. A 100-message conversation costs roughly 50x more per response than a 2-message conversation.

Most AI assistants don’t tell you this is happening. You just notice things getting sluggish and start a new conversation, losing all that context.

Ditto’s Approach: Summarize Once, Retrieve Efficiently

Instead of replaying your entire history on every message, Ditto pre-computes compact summaries of your conversation pairs. When you send a message, the system grabs relevant summaries, not raw transcripts, and injects them into the prompt.

Here’s the key insight: a 2,000-word coding discussion can usually be captured in a 200-word summary without losing the decisions, conclusions, or technical details that matter. That’s a 10x reduction in tokens, with the critical information preserved.

The summaries are generated asynchronously after conversations complete, so they’re always ready when you need them. No delay, no extra cost at query time.

What Gets Summarized

Not everything deserves the same treatment. Ditto’s summary engine focuses on:

Decisions and conclusions: “We agreed to use JWT for auth, not session cookies”
Technical specifics: code patterns, architecture choices, config details
Action items and next steps: “TODO: add error handling for the webhook endpoint”
User preferences: “User prefers TypeScript and avoids ORMs”

The raw conversations are still stored in full, summaries are an optimization layer, not a replacement. If the system needs to dive deeper into a specific memory, it can still retrieve the full content.

Memory Fetch: See Exactly What Ditto Remembers

Pre-computed summaries solve the efficiency problem. But there’s a related problem that most AI assistants ignore entirely: you have no idea what context the AI is using.

When ChatGPT references “something you mentioned before,” you’re trusting a black box. Did it actually retrieve the right conversation? Is it hallucinating a memory? You can’t tell.

Ditto v0.26.0 introduces Memory Fetch cards: expandable inline cards that show exactly which memories were retrieved for each response. You can see:

Which memories were pulled from your knowledge graph
The summary content that was injected into the prompt
The relevance score for each retrieved memory

This isn’t just a nice-to-have. Transparent context retrieval is how you build trust in an AI that claims to remember you. If you can see the context, you can verify the reasoning. If a memory is wrong or outdated, you can fix it.

The Numbers

v0.26.0 was validated with 71+ new tests across the message pipeline, memory utilities, time handling, and event streaming. In practice, users see:

Faster responses: less token processing per message, especially in long conversations
Lower costs: compact summaries instead of full conversation replays mean fewer tokens consumed
More accurate recall: summaries strip away noise, so the model focuses on what matters

The improvement is most dramatic for power users, the ones with hundreds of memories and multi-week conversation threads. These are exactly the users who were hitting the context wall hardest.

Why This Matters Beyond Performance

Pre-computed summaries aren’t just an optimization. They change the economics of persistent memory.

Most AI assistants can’t offer real long-term memory because the cost of injecting thousands of past conversations into every prompt is prohibitive. You’d burn through your token budget in days. The summaries break this trade-off: Ditto can reference your entire conversation history efficiently because it never sends the raw history, only the distilled, relevant context.

This is what makes memory-first AI viable at scale. The more you use Ditto, the more context it accumulates, and thanks to pre-computed summaries, that context stays fast and affordable to access.

Combined with Ditto’s knowledge graph and transparent memory retrieval, v0.26.0 represents a step change in how persistent memory works in practice, not just storing everything, but making stored context useful without burning through tokens.

Try It

Ditto v0.26.0 is live now. Every conversation you have automatically gets summarized and indexed for efficient retrieval.

Start a conversation at assistant.heyditto.ai, send a few messages, and then check the Memory Fetch cards to see your context in action. If you’ve been using Ditto for a while, your existing conversations have already been summarized, your experience just got faster without you doing anything.

Already using Ditto via MCP in Claude or Cursor? You’ll benefit too, the same pre-computed summaries power the search_memories and fetch_memories tools.

Ditto is the AI assistant that remembers everything. Try it free, no credit card required.