Discussion about this post

User's avatar
Scenarica's avatar

The four-scope breakdown is the most useful framework in here but honestly the part that landed hardest was something almost buried in the middle. Stale memory producing "confidently wrong" outputs because high relevance plus incorrect information doesn't signal uncertainty. That's the failure mode that will cause the first serious production incident in an enterprise agent deployment, and it'll happen precisely because the system looks like it's working perfectly right up until it isn't.

The memory vs context distinction is one that most teams are still getting wrong in practice. They see a million-token context window and assume the memory problem is solved. It isn't. It's masked. The window holds everything but weighs nothing. Memory is supposed to be the system's judgment about what matters, and judgment requires governance that a context window doesn't provide.

The Karpathy wiki framing is the one I keep coming back to. Ingest, query, lint. Three verbs that describe what most teams think they're doing with RAG but actually aren't, because RAG retrieves without evaluating whether what it retrieved is still true. The lint step is where the real work lives and it's the step almost nobody has built yet.

No posts

Ready for more?