Aller au contenu

Deterministic Context Assembly

Ce contenu n’est pas encore disponible dans votre langue.

Every AI coding agent faces the same bottleneck: the context window. A model can only reason about what it can see. Give it too little context, and it will hallucinate or produce code that ignores project conventions. Give it too much, and it wastes tokens on irrelevant information, potentially exceeding the budget entirely. Give it the wrong context, and it will confidently produce something that contradicts the project’s architecture.

The quality of an agent’s output is bounded by the quality of its context. This is not a model capability problem — it is an information retrieval problem. Deterministic context assembly is Contextia’s answer.

Retrieval-Augmented Generation (RAG) is the standard approach to feeding external knowledge into language models. It works by embedding documents into vectors, storing them in a vector database, and retrieving the most similar documents for a given query. RAG has proven effective for general knowledge retrieval, but it has fundamental limitations for software engineering context.

When an agent searches for “authentication” in a RAG pipeline, it gets documents that are semantically similar to the word “authentication.” But the error handling norms that apply to authentication code may not contain the word “authentication” at all. The architectural decision explaining why OAuth was chosen over SAML may rank below a README section that casually mentions auth. RAG retrieves what is similar, not what is necessary.

The same RAG query can return different results depending on the embedding model, chunk size, similarity threshold, and index state. Change the query from “authentication” to “auth flow” and the results may shift. Update an unrelated document and the rankings may change. This unpredictability makes it impossible to reason about what an agent was given, and therefore impossible to diagnose why it produced incorrect output.

When RAG returns partially relevant documents, the agent may extract incorrect information from them. A document about a deprecated authentication system, retrieved because it is semantically similar, can lead the agent to implement patterns that the project has explicitly abandoned. RAG has no concept of document status, lifecycle, or authority.

RAG requires an embedding pipeline, a vector database, an indexing process, and careful tuning of chunk sizes and similarity thresholds. This infrastructure must be maintained alongside the project. For a local development tool, this is disproportionate complexity.

Contextia takes a fundamentally different approach. Instead of searching for relevant context, it follows explicit links between artifacts. The process is mechanical, auditable, and reproducible.

Given a task ID, context assembly proceeds in well-defined steps:

Input: TASK-042
Step 1: Load task
→ Read work/tasks/TASK-042.md
→ Extract links: specs, decisions, norms
Step 2: Load referenced specs
→ Read system/specs/SPEC-BILLING-001.md
→ Read system/specs/SPEC-CURRENCY-001.md
→ Extract each spec's links: decisions, norms
Step 3: Load referenced decisions
→ Read system/rationale/DEC-BILLING-001.md
→ (deduplicate if already loaded via task links)
Step 4: Load referenced norms
→ Read system/norms/NORM-ERROR-001.md
→ Read system/norms/NORM-ASYNC-001.md
→ (deduplicate if already loaded via task or spec links)
Step 5: Collect code paths
→ From SPEC-BILLING-001: src/billing/invoice/**
→ From SPEC-CURRENCY-001: src/billing/currency/**
Step 6: Assemble context bundle
→ Order: identity → norms → decisions → specs → task → paths
→ Apply token budget constraints
→ Return structured result

Every step follows explicit references declared in YAML frontmatter. There is no ranking, no scoring, no probability. The same task with the same .contextia/ directory always produces the same context.

Links form a directed graph. A task references specs. Specs reference decisions and norms. Decisions may reference other decisions. Contextia follows these links to a configurable depth, collecting all reachable artifacts.

# TASK-042 frontmatter
links:
specs: [SPEC-BILLING-001, SPEC-CURRENCY-001]
decisions: [DEC-BILLING-001]
# SPEC-BILLING-001 frontmatter
links:
decisions: [DEC-BILLING-001, DEC-BILLING-003]
norms: [NORM-ERROR-001, NORM-ASYNC-001]
# SPEC-CURRENCY-001 frontmatter
links:
decisions: [DEC-CURRENCY-001]
norms: [NORM-ERROR-001]

The resulting context includes all transitively referenced artifacts, deduplicated:

  • SPEC-BILLING-001, SPEC-CURRENCY-001
  • DEC-BILLING-001, DEC-BILLING-003, DEC-CURRENCY-001
  • NORM-ERROR-001, NORM-ASYNC-001

If NORM-ERROR-001 is referenced by both specs, it appears once in the assembled context. The algorithm is straightforward graph traversal with deduplication.

Context windows have finite capacity. Even with models supporting 200K+ tokens, loading every linked artifact in full may exceed the budget or crowd out space needed for the agent’s reasoning. Contextia manages this through two mechanisms: progressive disclosure and budget-aware assembly.

Every artifact can be loaded at three levels of detail:

DepthWhat is includedTypical size
metaYAML frontmatter only (type, ID, title, status, links)5-15 lines
summaryFrontmatter + first Markdown section20-60 lines
fullEntire document50-500 lines

The agent (or human) chooses the depth based on what it needs:

Terminal window
# Just the map — what exists and how it connects
contextia context TASK-042 --depth meta
# Enough to understand each artifact's purpose
contextia context TASK-042 --depth summary
# Everything, when deep understanding is needed
contextia context TASK-042 --depth full

In practice, the agent starts with meta or summary to understand the landscape, then uses read_spec to load specific artifacts in full when it needs detail.

When a token budget is specified, Contextia assembles context within that limit using a priority system:

  1. Identity is always included (small, essential for orientation)
  2. Task is always included (the agent needs to know what it is doing)
  3. Specs are included next, ordered by direct relevance (referenced by task first, then transitively referenced)
  4. Norms follow, as they constrain implementation
  5. Decisions are included last, as they are context for understanding choices

If the budget is tight, the system automatically reduces depth — loading specs at summary instead of full, or dropping transitively-referenced decisions that are not directly linked from the task.

Terminal window
# Assemble context within a 4000-token budget
contextia context TASK-042 --budget 4000
# The system may reduce depth to fit:
# identity: full (120 tokens)
# task: full (200 tokens)
# SPEC-BILLING-001: summary (400 tokens)
# SPEC-CURRENCY-001: summary (350 tokens)
# NORM-ERROR-001: summary (180 tokens)
# NORM-ASYNC-001: summary (150 tokens)
# DEC-BILLING-001: meta (80 tokens)
# Code paths: list (60 tokens)
# Total: ~1540 tokens (within budget, with room for agent reasoning)

One of the most important properties of deterministic assembly is that it is auditable. When something goes wrong — the agent produces incorrect code, misses a constraint, or contradicts a decision — you can trace exactly what happened.

The context bundle is a deterministic function of the task ID and the .contextia/ directory. You can reproduce it:

Terminal window
contextia context TASK-042 --format json

This outputs the exact list of artifacts, their depth levels, and the link chain that included each one. There is no hidden state, no cached embeddings, no similarity scores to inspect.

If a relevant norm was not included in context, the reason is always one of these:

  1. The task did not link to a spec that references the norm
  2. The spec exists but does not list the norm in its links.norms
  3. The token budget caused it to be dropped

All three are visible and fixable. Add the missing link, increase the budget, or restructure the links. With RAG, the answer to “why was this document missing?” is usually “the embedding similarity score was below the threshold,” which is not actionable.

Contextia provides contextia check to validate link integrity:

Terminal window
$ contextia check
WARN SPEC-BILLING-001 links to DEC-BILLING-004, which does not exist
WARN src/billing/invoice/generator.py has @spec SPEC-BILLING-002,
but SPEC-BILLING-002 does not list this path
OK 14 specs, 8 decisions, 5 norms 2 warnings, 0 errors

Broken links are caught before they affect context assembly. This is preventive maintenance that RAG systems cannot offer — you cannot validate that an embedding will retrieve the right document.

When context is assembled, it is formatted for its audience. The MCP server returns context optimized for LLM consumption:

=== IDENTITY ===
project: acme-billing | python, typescript | fastapi, react
=== TASK: TASK-042 ===
title: Add multi-currency support to invoice generation
status: in_progress
specs: SPEC-BILLING-001, SPEC-CURRENCY-001
=== SPEC: SPEC-BILLING-001 ===
title: Invoice Generation
status: current
paths: src/billing/invoice/**, src/billing/templates/invoice*
The invoice generation subsystem creates PDF invoices from
subscription and usage data at the end of each billing cycle.
[...]
=== NORM: NORM-ERROR-001 ===
title: Error Handling Conventions
[...]

This format is designed for token efficiency: minimal markup, clear section boundaries, essential information front-loaded. It is not Markdown (which wastes tokens on formatting characters) and not JSON (which wastes tokens on structural syntax). It is a compact, readable format that gives the agent maximum information per token.

Deterministic context assembly works best when:

  • The project has well-defined specifications and decisions
  • The relationships between artifacts are known and stable
  • Completeness and reproducibility matter more than discovery
  • The agent needs to follow project conventions precisely

It is less suitable when:

  • The project has no structured documentation (Contextia needs artifacts to link)
  • The task is exploratory and the relevant documents are not known in advance
  • The codebase is so small that the entire project fits in context

For most production codebases with established architecture and conventions, deterministic assembly provides more reliable results than search-based approaches. The upfront cost of writing structured artifacts pays for itself in the consistency and auditability of agent output.