Adopting Contextia in an Existing Project

Adding Contextia to a project mid-development is the common case, not the exception. Most teams have months or years of accumulated code, implicit conventions, and undocumented decisions. This guide walks through the full adoption process — from the first contextia init to a fully integrated workflow with CI enforcement and AI agent sessions that load the right context automatically.

The central principle: you do not need to document everything before Contextia becomes useful. A single spec with two annotations is already better than nothing. Build the knowledge layer incrementally, starting with the areas where AI agents are actually working.

Before you start: survey the codebase

The most valuable thing you can do before writing a single Contextia file is to understand the project as it actually is — not as you imagine it, or as the README says it is.

Spend 30 minutes with the following questions:

Structure

What are the major modules or layers?
Where does the application start? (entry points, main, index)
Where is business logic concentrated vs. where is it thin glue?
What is the test layout? Do tests mirror source?

Implicit norms

What naming conventions are being followed? Are they consistent?
How are errors handled? Are there multiple patterns in use?
Is there a consistent API response format?
What does the import/dependency structure look like?

Buried decisions

Are there surprising technology choices? (non-obvious libraries, unusual patterns)
Is there code that looks wrong but clearly works? There is usually a reason.
Are there comments that say “don’t change this” or “intentionally” or “workaround”? These are decisions.
What does git log --oneline suggest about major milestones?

Institutional knowledge gaps

What would a new contributor get wrong on their first PR?
What is the one thing everyone on the team knows but is not written down?

Write down your answers. They will become your identity document, your first norms, and your first decision records.

Phase 1: Initialize

Install Contextia

pip install contextia
# or, without installing globally:
uvx contextia --help

Run init

Navigate to your project root and run:

contextia init

Contextia scans for manifest files (pyproject.toml, package.json, Cargo.toml, go.mod) and creates .contextia/ with a starter config.yaml and identity.md.

If you prefer manual setup:

mkdir -p .contextia/system/{specs,norms,rationale,intents}
mkdir -p .contextia/work/{tasks,active,archive}
touch .contextia/config.yaml
touch .contextia/system/identity.md

What gets created

.contextia/
├── config.yaml           ← project-level configuration
└── system/
    ├── identity.md       ← always-loaded project identity
    ├── specs/            ← behavioral contracts
    ├── norms/            ← coding conventions
    ├── rationale/        ← architectural decisions
    └── intents/          ← high-level goals (optional)

The work/ directory is empty. It fills in as you create tasks and session logs.

Phase 2: Write config.yaml

The config file controls how Contextia scans, validates, and annotates your project. Start permissive and tighten over time.

Minimal config for an existing project

project:
  name: my-project

# Start with everything off or warning.
# Enable error-level checks only after your baseline is established.
check:
  orphan_specs: off
  broken_refs: warning
  broken_links: warning
  stale_specs: off
  missing_frontmatter: error   # this one is safe to enable immediately
  unverified_behaviors: off
  undocumented_modules: off

context:
  default_budget: null
  max_identity_lines: 200

annotations:
  prefix: "@"
  comment_syntax:
    python: "#"
    typescript: "//"
    javascript: "//"
    rust: "//"
    go: "//"
    ruby: "#"
    java: "//"

# List only the paths you want scanned.
# If omitted, Contextia scans from the project root.
source_roots:
  - src/
  - tests/

Adjusting for your stack

Python project:

annotations:
  comment_syntax:
    python: "#"
source_roots:
  - src/
  - tests/

Node/TypeScript project:

annotations:
  comment_syntax:
    typescript: "//"
    javascript: "//"
source_roots:
  - src/
  - test/
  - __tests__/

Monorepo with multiple languages:

annotations:
  comment_syntax:
    python: "#"
    typescript: "//"
    go: "//"
source_roots:
  - backend/src/
  - frontend/src/
  - services/*/src/

Excluding noise

check:
  ignore_patterns:
    - "node_modules/**"
    - "**/*.generated.*"
    - "**/migrations/**"
    - "vendor/**"
    - "dist/**"
    - "build/**"

Phase 3: Write identity.md

The identity document is the most important file in your entire .contextia/ directory. It is loaded at the start of every AI agent session. Every agent, every session, every task — identity.md is always there.

Write it as if you are onboarding a new team member who is about to write code. Not a new manager, not a potential investor — a developer who needs to understand the codebase well enough to make a correct change on day one.

What belongs in identity.md

What the project does and who uses it. One paragraph. Specific. Not marketing copy.

## What is this

OrderFlow is an internal fulfillment automation tool for the warehouse team at Acme Corp.
It receives order events from Shopify webhooks, validates them against inventory levels,
assigns them to warehouse zones, and emits pick-list jobs to the WMS.
It does not handle payment, returns, or customer communication.

The tech stack and its non-obvious aspects. Not just “Python + Postgres” but why, and what that means in practice.

## Tech stack

- Python 3.12 (FastAPI for the HTTP layer, Celery for async jobs)
- PostgreSQL 15 (primary store for orders and audit log)
- Redis 7 (Celery broker, idempotency keys, rate limiter state)
- Docker Compose for local development, Kubernetes on prod

FastAPI was chosen over Django for lower overhead on pure API endpoints.
The app is stateless per-request; all state is in Postgres or Redis.

The architecture at the module level. Not a full diagram, but enough to orient a developer.

## Architecture

The app has four layers:

- `api/` — FastAPI routers, input validation, auth middleware. Thin. No business logic.
- `services/` — Business logic. One service class per domain (OrderService, InventoryService, etc.).
  Services call repositories, never call each other directly.
- `repositories/` — Database access via SQLAlchemy. No business logic.
- `workers/` — Celery tasks. Each task calls a service method.

The `api/` → `services/` → `repositories/` flow is enforced. If you see a router calling
a repository directly, that is a bug.

Build and test commands. The real ones. An agent that cannot run tests cannot validate its work.

## Commands

```bash
make dev            # Start local stack (Docker Compose)
make test           # Run pytest (all tests)
make test-unit      # Unit tests only (no Docker required)
make lint           # ruff check + mypy
make migrate        # Run Alembic migrations

**What is in scope and what is not.** This is the most-overlooked section. It tells agents what NOT to do.

```markdown
## Boundaries

**In scope**: Order ingestion, inventory reservation, zone assignment, pick-list generation,
audit logging, webhook retry logic.

**Out of scope**: Payment processing, returns/refunds, customer emails, the WMS itself
(we emit to it via API, we do not own it), Shopify admin UI changes.

**Do not touch**: `src/legacy/` — this module is frozen. Any change requires approval from @alice.

Non-obvious conventions that apply everywhere.

## Key conventions

- All service methods are synchronous. Async is handled at the Celery layer, not inside services.
- Database IDs are UUIDs everywhere. Never use integer IDs in API responses.
- All timestamps are UTC. The UI layer converts to local time; the backend never does.
- Error responses always include a `code` field (machine-readable) and a `message` field (human-readable).
  See NORM-ERROR-001 for the full contract.

What does NOT belong in identity.md

Instructions for the AI agent. Those go in CLAUDE.md, not here.
Spec details. identity.md is the overview. Specs are the detail.
Changelog. Use git for that.
Lists of all files. That is what directory structure tools are for.
Aspirational content. Write what is true now, not what you plan to build.

The “new team member” test

Read your identity.md as if you have never seen this codebase. Ask:

After reading this, do I know where to put new business logic?
Do I know how to run the tests locally?
Do I know what I must not break?
Do I know what is mine to change vs. what I need permission to touch?
Do I know the three or four things that would cause my first PR to be rejected?

If any answer is “no”, add that information.

Phase 4: Create a baseline

Before adding specs, run the integrity check to capture the current state:

contextia check --create-baseline

With no specs yet, the baseline will be nearly empty. That is fine. The point is to establish the starting line. Once the baseline exists, any new violation you introduce is immediately visible.

Commit the baseline:

git add .contextia/.baseline.json
git commit -m "chore: add contextia baseline"

Phase 5: Spec archaeology — recovering behavioral contracts from existing code

This is the hardest and most valuable phase. Your code already implements behaviors. The goal is to make them explicit.

Where to start: not at the top

Do not try to spec the entire system. Start with the area where an AI agent will work first. If you are about to implement a new feature in the payment module, spec the payment module now.

If you have no specific work planned, use this priority order:

API surface — what the system promises to external callers
Core domain logic — the most business-critical behaviors
Cross-cutting concerns — auth, error handling, validation patterns
Integration points — how the system talks to external systems

Technique: reverse-engineer from tests

Tests are the most accurate documentation of actual behavior. For each test, ask: what behavior does this test verify?

def test_reserve_fails_when_insufficient_stock():
    # Given an order for 10 units of SKU-001
    # And only 3 units available
    # When reserve_inventory is called
    # Then InventoryError is raised with code "INSUFFICIENT_STOCK"

This translates directly to a WHEN/THEN behavior:

WHEN `reserve_inventory` is called for a quantity exceeding available stock
THEN the system SHALL raise `InventoryError` with code `INSUFFICIENT_STOCK`

Go through your test suite. Every test is a behavior. Group related tests into specs.

Technique: read the API contract outward-in

Start at the outermost boundary — your API routes, CLI commands, or public module interface. Each route is typically 1-2 behaviors.

@router.post("/orders")
async def create_order(payload: OrderCreate, ...) -> OrderResponse:
    ...

Ask: what does this endpoint do? What can go wrong? What are the edge cases?

## Behaviors

### Create-Order-Success
WHEN a POST /orders request is received with a valid payload
AND the user has the `orders:write` scope
THEN the system SHALL create the order, reserve inventory, and return HTTP 201

### Create-Order-Invalid-Payload
WHEN a POST /orders request is received with missing required fields
THEN the system SHALL return HTTP 422 with field-level validation errors

### Create-Order-Insufficient-Stock
WHEN a POST /orders request is received
AND insufficient stock exists for one or more line items
THEN the system SHALL return HTTP 409 with code `INSUFFICIENT_STOCK`
AND no inventory SHALL be reserved

Technique: use git blame and PRs

git blame shows you who changed what and when. git log -p --follow <file> shows the history of a file. PR titles and descriptions often contain the “why” for a change.

git log --oneline --follow src/services/order_service.py | head -20
git show <commit-hash>

PR descriptions like “Reworked retry logic to handle transient DB errors” tell you:

There is retry logic in this service.
It was specifically for transient DB errors.
It was “reworked” — meaning the first version was wrong.

That is a behavior worth capturing in a spec.

Stub specs vs. full specs

You do not need to write perfect specs on day one. A stub spec is better than no spec.

---
id: SPEC-015
title: Order cancellation
description: Cancels an existing order and releases reserved inventory.
status: draft
paths:
  - src/services/order_service.py
---

## Objective

Allow orders in non-final states to be cancelled, releasing reserved inventory back to stock.

## Behaviors

TODO: fill in from tests/orders/test_cancellation.py

## Notes

Stub spec. Review and complete before next AI session on this area.

A stub spec with a paths field is enough for agents to know where behavior lives. The status: draft signals that it should not be treated as authoritative.

Phase 6: Norm extraction

Norms are coding conventions that apply across your codebase. Unlike specs, they describe how code should be written, not what it does.

Finding existing norms

Look for patterns that appear consistently across the codebase:

# Find consistent error handling patterns
grep -r "raise.*Error" src/ | head -20

# Find response format patterns
grep -r "return.*{" src/api/ | head -20

# Find logging patterns
grep -r "logger\." src/ | head -20

Also look for:

README sections titled “conventions” or “guidelines”
PR review comments that say “we always do X”
Linter rules (.ruff.toml, .eslintrc) — each rule is a norm
Comments in code that say “per our convention…”

What makes a good norm

A norm is worth capturing if:

It is not obvious from the language or framework.
Violating it would cause a PR rejection.
An AI agent would not know it without being told.
It applies to more than one file.

Good norm: “All API endpoints return the standard envelope { data, error, meta }. Never return a bare object.”

Not a norm (it’s just Python): “Use list comprehensions instead of map() for simple transformations.”

Not a norm (it’s framework convention): “FastAPI route handlers must be async.”

Writing a norm

---
id: NORM-API-001
title: API response envelope
status: active
paths:
  - src/api/**
---

## Rule

All API endpoints must return responses in the standard envelope:

```json
{
  "data": {},
  "error": null,
  "meta": {
    "request_id": "uuid",
    "timestamp": "ISO-8601"
  }
}

No endpoint may return a bare object, array, or primitive as the top-level response.

Rationale

Consistent envelope format allows the frontend and API clients to handle errors uniformly without checking the response shape. The meta.request_id field is required for distributed tracing.

Exceptions

Health check endpoints (GET /health, GET /ready) may return { "status": "ok" }.

---

## Phase 7: Decision archaeology

Architectural decisions are the hardest knowledge to recover. They were made by people, in meetings, sometimes years ago, and the rationale is often in someone's head or buried in a Slack thread.

### Identifying decisions worth capturing

A decision is worth capturing if a new developer would ask "why?" when they see it.

Signs of buried decisions:
- Non-obvious library choices (`httpx` instead of `requests`, `cattrs` instead of `pydantic`)
- Architectural patterns that are not standard for the tech stack
- Code that looks wrong but clearly works
- Comments that say "do not change this" without explanation
- TODOs that reference something that has been fixed but the pattern remains

### The git log technique

```bash
# Find commits that changed core files significantly
git log --oneline src/services/ | head -30

# Find commits with "refactor" or "redesign"
git log --oneline --grep="refactor\|redesign\|replace\|migrate" | head -20

Look for commits like:

“Replace synchronous HTTP calls with async queue” → there was a decision about async
“Remove Django, replace with FastAPI” → there was a framework migration decision
“Add UUID primary keys across all models” → there was an ID strategy decision

Writing a decision record

---
id: DEC-008
title: UUID primary keys across all models
status: accepted
date: 2023-08-14
---

## Context

The original schema used auto-increment integer PKs. As the service grew to multiple
environments (prod, staging, dev) and needed to export/import data between them,
integer ID collisions and ordering assumptions became a problem.

## Decision

Use UUID v4 as primary keys for all database models. Expose UUIDs in all API responses.
Never expose integer IDs externally.

## Rationale

- No collision risk when merging data across environments
- No ordering information leakage via sequential IDs
- Natural fit for distributed systems if we ever shard

## Consequences

- All `id` fields are strings in API responses (not integers)
- Lookup by integer ID is no longer possible (must use UUID)
- Slightly larger storage footprint (~16 bytes vs. ~4 bytes per PK)
- Alembic migration required to backfill existing tables

## Related

- SPEC-001 (User API), SPEC-008 (Order API) — all IDs are UUIDs

Even a short record is valuable:

---
id: DEC-012
title: Celery over asyncio for background jobs
status: accepted
date: 2024-01-10
---

## Context

The app needed to process orders asynchronously without blocking HTTP responses.

## Decision

Use Celery with Redis broker for all background processing.

## Rationale

- Celery provides retries, scheduling, and distributed workers out of the box
- asyncio would require restructuring the entire service layer
- Team had existing Celery expertise

## Consequences

- Requires Redis as a runtime dependency
- Workers run as separate processes, not threads
- Task arguments must be JSON-serializable

Phase 8: Annotate code

Annotations are the bottom-up side of the bidirectional link between code and specs. A spec’s paths field points down to code. An @spec annotation in code points up to a spec.

Where to put annotations

At the class definition for classes that implement a spec:

# @spec SPEC-015
class OrderProcessor:
    """Handles order processing pipeline."""
    ...

At the function definition for functions that implement a specific behavior:

# @spec SPEC-015.Reserve-Inventory
def reserve_inventory(self, order_id: UUID, items: list[LineItem]) -> Reservation:
    ...

At the test for tests that verify a behavior:

# @test SPEC-015.Insufficient-Stock
def test_reserve_fails_when_insufficient_stock():
    ...

At the decision point for code that exists because of a specific architectural decision:

# @decision DEC-008 (all PKs are UUIDs)
class BaseModel(Base):
    id: Mapped[UUID] = mapped_column(primary_key=True, default=uuid4)

Annotation granularity

You do not need to annotate every function. The goal is to make the link between spec and code navigable, not exhaustive.

Good coverage: annotate at the class/module level for most specs, and at the function level only when a specific behavior in the spec has a distinct implementation.

Over-annotated (avoid):

# @spec SPEC-015
def _validate_line_items(self, items: list[LineItem]) -> None: ...
# @spec SPEC-015
def _check_inventory(self, items: list[LineItem]) -> None: ...
# @spec SPEC-015
def _reserve_items(self, items: list[LineItem]) -> Reservation: ...
# @spec SPEC-015
def _notify_warehouse(self, reservation: Reservation) -> None: ...

Right level:

# @spec SPEC-015
class OrderProcessor:
    def _validate_line_items(self, items): ...
    def _check_inventory(self, items): ...

    # @spec SPEC-015.Insufficient-Stock
    def _reserve_items(self, items): ...  # this specific behavior has its own WHEN/THEN

Incremental annotation strategy

Do not annotate the entire codebase at once. Use the “annotate on touch” rule: whenever you or an AI agent modifies a file, add the relevant @spec annotation at that point.

After one sprint of active development, the most-changed files — which are also the most important — will be annotated.

Phase 9: Create your first tasks

Tasks connect the knowledge layer (specs) to the operational layer (what you are doing right now).

Create a task for each active work stream

If you are working on a new feature:

contextia new task --title "Implement order cancellation API"

The task file gets created in .contextia/work/tasks/:

---
id: TASK-001
title: Implement order cancellation API
kind: feature
status: created
priority: high
specs:
  - SPEC-015
depends_on: []
---

## Objective

Add POST /orders/{id}/cancel endpoint that transitions an order to CANCELLED
and releases reserved inventory.

## Done criteria

- [ ] Endpoint returns 200 on success
- [ ] Endpoint returns 409 if order is already in a final state
- [ ] Inventory reservation is released transactionally
- [ ] Tests cover all WHEN/THEN behaviors in SPEC-015
- [ ] Session log written

The daily workflow

Once tasks exist, the workflow becomes:

# Start of session
/contextia-start TASK-001
# → loads identity, task definition, linked specs, last session log

# Work...

# End of session
/contextia-end TASK-001
# → writes session log with what was done, decisions made, next steps

The session log ensures continuity between AI sessions. When you resume tomorrow, the agent knows exactly where you left off.

Phase 10: Set up the MCP server

The MCP server gives AI agents (Claude Code, Cursor, etc.) direct tool access to your Contextia knowledge base, without copy-pasting.

Claude Code

claude mcp add contextia -- uvx contextia-mcp

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "contextia": {
      "command": "uvx",
      "args": ["contextia-mcp"]
    }
  }
}

VS Code

Add to .vscode/mcp.json:

{
  "servers": {
    "contextia": {
      "type": "stdio",
      "command": "uvx",
      "args": ["contextia-mcp"]
    }
  }
}

Test the connection

In your AI assistant, ask it to call the find_spec tool:

Use the contextia find_spec tool to find specs related to "order processing"

If it returns results, the MCP server is working. If it returns an error or says it does not have the tool, check that the MCP server is configured and that your .contextia/config.yaml is valid.

Phase 11: Enable CI enforcement

Add contextia check to your CI pipeline once you have at least 5 specs and some annotations:

# GitHub Actions
- name: Contextia integrity check
  run: contextia check --ci

The --ci flag exits with code 1 on any error-level violation, so the pipeline fails. Warning-level violations are reported but do not fail the build.

Since you have a baseline, pre-existing gaps do not block PRs. Only new violations introduced by the PR will fail the check.

Reducing the baseline over time

Track baseline reduction as a team metric:

# How many violations are baselined?
contextia check --format json | jq '.summary.baselined'

# After fixing some violations, update the baseline
contextia check --update-baseline
git add .contextia/.baseline.json
git commit -m "chore: reduce contextia baseline (N violations resolved)"

Set a cadence: review the baseline once per sprint. When it reaches zero, you have full integrity enforcement.

Incremental roadmap

Week 1: Foundation

contextia init
Write identity.md (the “new team member” test)
Configure config.yaml for your stack
contextia check --create-baseline
Commit .contextia/

Time: 2-4 hours.

Week 2: First specs

Identify the 3 most important behavioral areas
Write one spec per area (stubs are fine)
Add @spec annotations to the corresponding classes
Update paths in each spec frontmatter
Create your first task

Time: 2-3 hours.

Week 3: Norms and decisions

Extract 3-5 norms from the codebase
Record 3-5 decisions from git history and team knowledge
Link decisions to specs
Add MCP server to your AI tool

Time: 2-3 hours.

Week 4: Integrate into daily workflow

Use contextia-start / contextia-end for every AI session
“Spec on touch”: add specs when you modify code
Add CI check
Present to the team

Time: ongoing, 10-15 minutes per session.

Common patterns by project type

Python backend (FastAPI / Django / Flask)

Organize spec IDs by layer: SPEC-API-NNN for routes, SPEC-SVC-NNN for services, SPEC-DB-NNN for data models.

Start annotations at the service class level. Routes are thin wrappers; annotate services.

Decision record priorities: framework choice, ORM strategy, authentication mechanism, async vs. sync.

TypeScript / Node backend

Organize by feature domain: SPEC-AUTH-NNN, SPEC-ORDERS-NNN.

Annotate at the class or exported function level. In functional codebases, annotate the module.

Frontend (React / Vue)

Spec user-visible behaviors: SPEC-UI-NNN. Each spec corresponds to a user-facing feature, not a component.

Norms are particularly valuable for frontend: state management patterns, component conventions, CSS architecture.

Monorepo

Each package gets its own spec prefix: SPEC-BACKEND-NNN, SPEC-MOBILE-NNN, SPEC-SHARED-NNN.

Identity.md should describe the monorepo as a whole, with a subsection per major package.

Consider per-package config.yaml overrides for language-specific annotation syntax.

Library / SDK

Spec the public API. Every exported function or class is a candidate for a spec.

@test annotations are especially important: every spec should have corresponding test annotations.

Anti-patterns

Trying to spec everything before starting

You will get overwhelmed, the specs will be inaccurate (you are speccing from memory, not from the code), and Contextia will never get used. Start with three specs and iterate.

Copying implementation into specs

## Bad — describes HOW, not WHAT
WHEN `reserve_inventory` is called
THEN `Redis.INCR` is called on key `inv:{sku}` and compared against `config.limits[sku]`

## Good — describes WHAT happens
WHEN `reserve_inventory` is called for a quantity within available stock
THEN the reservation is confirmed and the available count is decremented atomically

The implementation can change from Redis to Postgres without the spec changing. The spec describes the contract, not the internals.

Specs without WHEN/THEN behaviors

A spec that is only narrative text with no testable behaviors is harder for agents to work with. Always include at least two WHEN/THEN pairs.

An identity.md that is just the README

The README is written for humans browsing GitHub. identity.md is written for an AI agent about to write code. They have different audiences and different purposes. The README talks about features; identity.md talks about constraints, conventions, and where things live.

Norms that duplicate language idioms

“Use list comprehensions” is a Python style preference, not a project norm. Norms document conventions that are specific to your project and that an agent would not know from language knowledge alone.

Stale session logs

A session log that says “TODO: fill in later” and was never completed is worse than no log. Write session logs at the end of each session, not the next day.

Troubleshooting

`contextia check` reports too many errors

Enable the baseline immediately: contextia check --create-baseline. Existing violations are recorded and will not block your work. Fix them incrementally.

Agents are not loading the right context

Check that: (1) the task has specs: listed in frontmatter, (2) the specs have paths: or @spec annotations link to them, (3) contextia context {task-id} returns meaningful output.

identity.md is too long

Aim for under 200 lines. Use the max_identity_lines config to enforce this:

context:
  max_identity_lines: 200

Move module-specific information to individual specs. Identity is for the whole project; specs are for the parts.

The team is not adopting it

Show, do not tell. Run one AI agent session live, with contextia-start loading context. Show the before (agent asking about basic architecture) and after (agent knowing the constraints and producing correct code immediately). That demonstration is worth any amount of documentation.

Next steps

Read Writing Effective Specs for the full behavioral spec methodology.
See Agent Workflow for the complete AI-assisted development workflow.
See CI/CD Integration for pipeline setup details.
Read the MCP Tools reference to understand what tools the MCP server exposes to agents.