Skip to main content

Command Palette

Search for a command to run...

ClearCode Part 3: Memory, Agent Reasoning, Skills, and MCP

Updated
11 min readView as Markdown
D
Fullstack Software Engineer specialising in cloud architecture, system design, and AI. I write deep dives on building production systems — the decisions, the tradeoffs, and the reasoning behind them. No fluff, no tutorials you've seen a hundred times.

Where Part 2 left us

Part 2 built the context layer: AST-aware indexing with tree-sitter across 15 languages, three retrieval backends switchable via a single config field, and honest documented limitations around module-level constants. ClearCode could understand a codebase.

Part 3 makes it remember, reason, and act.

Four capabilities were built across approximately 12 commits. This post covers each one with the design decisions, the implementation details, and the things that still do not work correctly. One of the flaws documented below was fixed in the final commit before this post was written. That is not accidental — it is part of how this series works.


1. Short-term memory

What was built

Two modules: memory/session.py handles session identity, memory/short_term.py handles persistence and compression.

Sessions are UUID strings, generated on first run and persisted to .memory/current_session. On startup, ClearCode reads this file and resumes the existing session. Three REPL commands surface session management:

/new_session        start a fresh conversation
/switch         resume a past session by ID
/session            show the current ID

Persistence is handled by LangGraph's SqliteSaver checkpointer, which stores the full conversation state in a SQLite database at .memory/memory.db. Every turn is checkpointed. Closing and reopening ClearCode in the same directory resumes exactly where the conversation ended.

Context compression is handled by SummarizationMiddleware, configured via config.yaml:

memory:
  db_path: .memory/memory.db
  summarize_at_tokens: 4000
  keep_last_messages: 20

Once the conversation crosses 4000 tokens, the middleware compresses everything before the last 20 messages into a summary. The summary replaces the older messages; the last 20 turns stay verbatim. Token cost is bounded across long sessions while preserving fine-grained detail of recent context.

What it looks like in practice

> /ask What is the name of the localStorage key this app uses, and what does
  each task object look like — list its exact fields?

The localStorage key is "orbitTasks".

Each task object has three fields:
  - id        unique identifier — crypto.randomUUID() with Date.now() fallback
  - title     the text of the task
  - completed boolean

> /ask Perfect. Now I want to support multiple named lists (Work, Personal).
  Using the storage key you just identified, propose the minimal change.

Instead of the single key "orbitTasks", use a dynamic key based on the list
name — "orbitTasks-Work", "orbitTasks-Personal", and so on.

Two functions need to change:

    function saveTasks(listName) {
        localStorage.setItem(`orbitTasks-${listName}`, JSON.stringify(tasks));
    }

    function addTask(title, listName) {
        tasks.unshift({
            id: crypto.randomUUID ? crypto.randomUUID() : String(Date.now()),
            title,
            completed: false
        });
        saveTasks(listName);
        renderTasks();
    }

Everything else — toggleTask, deleteTask, renderTasks — remains unchanged.

The second answer picks up "orbitTasks" from the first turn without re-searching. The agent read it from the conversation history that the checkpointer had stored.

On documenting flaws

The original implementation of get_checkpointer() opened a new SQLite connection on every call and never closed it. Connections accumulated across the session. It worked — SQLite tolerates concurrent connections to the same file — but the connection count grew without bound.

I documented this in CLAUDE.md. Writing it down made it impossible to publish with it in that state. The fix — a module-level singleton connection — went in as the final commit before this post was written.

This is the part of building in public that people do not always talk about: the act of documenting a flaw clearly enough that it can be explained to a reader is also the act of understanding it clearly enough to fix it. CLAUDE.md has become a forcing function.

The remaining known flaws

get_session_history() is defined in short_term.py but never called. Dead code. Left in because it documents the interface — the function that would be called if session history needed to be inspected programmatically. It should either be wired up or deleted.

memory.db_path is CWD-relative. Running ClearCode from different directories creates separate .memory/ folders. This is the right default behaviour for a per-project tool, but it means /switch only works if you launch from the same directory where the session was created.

show_index in the Qdrant backends hardcodes limit=1000. For large codebases this silently truncates the displayed index. The correct approach is pagination or a configurable limit.


2. Agent reasoning

Part 2 built a retrieval system that returned relevant chunks. Part 3 wires the agent to use those chunks as the starting point for multi-step reasoning across tool calls.

The agent chains tools: retrieve context, read a specific file, follow a function reference, read that file, return a complete picture. Retrieval is no longer the end of the pipeline — it is the beginning.

A real example, run against a live codebase:

> /ask Trace what happens from the moment a user submits the add-task form to
  when the new task appears on screen. Give me the exact function call chain.

Here's the detailed trace:

1. Event Listener and Form Submission
   The process begins when an event listener captures the form submission event,
   prevented from its default action via event.preventDefault(). Defined in
   todo-app/js/scripts.js, lines 102-115.

2. Task Title Extraction and Verification
   The task title is extracted with taskInput.value.trim(). If the title is
   empty, focus returns to the input field and the function exits.

3. addTask(title) — lines 73-82
   Creates a new task object and unshifts it to the front of the tasks array.
   id is generated with crypto.randomUUID() or String(Date.now()) as a fallback.
   Calls saveTasks() then renderTasks().

4. saveTasks() — lines 14-16
   Serialises the tasks array to localStorage under the key "orbitTasks".

5. renderTasks() — lines 38-71
   Clears the current task list (taskList.innerHTML = ''), re-renders every
   visible task as a 
 with a check button, title span, and delete button,
   then calls updateStats() to refresh the counters.

6. Interface Update
   Input field is cleared and refocused. The empty-state element is toggled
   based on whether any visible tasks remain.

The agent read scripts.js via the filesystem MCP server, found the exact line ranges, and traced the full call chain — not from memory, but from the actual file.

This is the difference between a retrieval system and an agent. Retrieval returns relevant text. An agent uses that text to decide what to read next.


3. Skills

The problem skills solve

A coding agent pointed at a machine learning repository and asked to add a feature should apply different standards than the same agent pointed at a React component library. The ML repo might have conventions around reproducibility, experiment tracking, and data pipeline structure that the React repo does not. The agent has no way to know this from the code alone.

Skills are the mechanism for injecting that domain knowledge without paying for it on every query.

Three-tier progressive disclosure

Skills use a three-tier loading model to keep token costs low while making full expertise available on demand.

Tier 1 — always in the system prompt. A compact index of skill names, descriptions, and when_to_use trigger keywords. The agent sees this on every query, paying only a few tokens per skill regardless of how detailed the skill body is. Twenty skills cost the same as zero in token terms until one is triggered.

Tier 2 — loaded on match. When the user's request matches a skill's trigger keywords, the agent calls load_skill(name) to retrieve the full SKILL.md body — complete instructions, decision rules, code templates, and rationale.

Tier 3 — fetched individually. Skills can ship support files (scripts, templates, reference documentation). After loading Tier 2, the agent sees a listing of available support files and reads only the ones the task actually requires.

Skills live in .clearcode/skills//SKILL.md inside whatever project ClearCode is pointed at. They are project-specific, versioned alongside the code, and visible in the repository.

What it looks like in practice

> /ask I have a 40-page financial report PDF with embedded tables on every
  page. I need to extract all of them into a single pandas DataFrame.

To extract tables from a multi-page PDF into a single DataFrame, use pdfplumber:

    import pdfplumber
    import pandas as pd

    with pdfplumber.open("your_report.pdf") as pdf:
        all_tables = []
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)

    combined_df = pd.concat(all_tables, ignore_index=True)

pdfplumber preserves table geometry better than pypdf for dense financial
layouts. If a page has no discernible table structure, extract_tables()
returns an empty list — the guard (if table) handles that safely.

The pdf skill was loaded from .clearcode/skills/pdf/SKILL.md. The agent did not guess at the library choice — it followed instructions that encode a decision already made: pdfplumber for dense financial layouts, with the reasoning documented in the skill rather than inferred from the query.

The skill trigger change worth naming

An early version of the skill trigger instruction was suggestive: "if the request matches, consider loading the skill." Under ambiguous queries, the agent sometimes retrieved a result without loading the relevant skill first.

The trigger was rewritten to be directive: "if the request matches these keywords, you must call load_skill before responding." One word in the system prompt, one word in behaviour: the agent now reliably loads domain knowledge before acting on matched requests.


4. MCP

Two servers ship by default

Two MCP servers are configured in clearcode_mcp_servers.json at launch:

  • A GitHub server. Public repos work unauthenticated. Add GITHUB_TOKEN to .env for private repos and write access.

  • A filesystem server scoped to the current working directory. The agent can read and write files within the project.

Adding a new server — Jira, Linear, Slack, a database — is a single entry in the JSON config. The agent picks it up at next startup with no code changes required.

> /ask Go to the GitHub repo f2015537/clearcode and look at the last 5 commits
  on main. Summarise what has been built layer by layer.

Here is a summary of the last 5 commits on the main branch:

- cf98612 — "Add agent, skills, and memory examples to README"
  Documents the system's live capabilities with real output — code navigation,
  skill-driven answers, and session memory continuity.

- 6dd6b3e — "Update CLAUDE.md and README to reflect current project state"
  Marks agent, memory, MCP, and skills layers complete. Removes the local
  filesystem tools now covered by the filesystem MCP server.

- d9db21f — "Add filesystem MCP server; remove local filesystem tools"
  Shifts filesystem operations to a centralised MCP server scoped to CWD —
  same capabilities, no bespoke tool code to maintain.
...

The agent called GitHub's list_commits tool, retrieved live data from the API, and composed the summary. No local git history involved.

The architectural change that MCP made possible

Early in Part 3, I wrote bespoke filesystem tools: read_file, write_file, append_file, delete_file, list_directory, file_exists. Each was a Python function registered with the agent.

Once the MCP filesystem server was added, those tools became redundant. The filesystem server provides the same capabilities through the same protocol the GitHub server uses. The bespoke code was removed.

This is the point of MCP: not just adding capabilities, but changing the cost structure of adding them. A new capability used to mean writing a Python function, handling errors, managing logging, and registering the tool with the agent. Now it means one entry in a JSON file. The agent does not care whether a capability comes from a local function or an MCP server — it calls tools the same way either way.


Where Part 3 leaves things

The four active capabilities in ClearCode are now: retrieval (dense, sparse, hybrid), agent reasoning (multi-step tool use), skills (three-tier progressive disclosure), and MCP (GitHub and filesystem by default, extensible via config).

The known flaws are documented in CLAUDE.md. The connection leak in get_checkpointer() was fixed before this post was published. The dead code in get_session_history() and the hardcoded limit=1000 in show_index are still there — named, not hidden.

Part 4 is not decided yet. The remaining layers are safety, freshness, observability, and eval. If you have a preference on what comes next, drop it in the comments. I read every one.

Full source: https://github.com/f2015537/clearcode

Part 1 - Architecture before code: https://blog.divyampatro.dev/clearcode-part-1-reverse-engineering-a-coding-agent-before-writing-a-single-line-of-code Part 2 - Context layer: https://blog.divyampatro.dev/clearcode-part-2-ast-aware-indexing-vector-stores-and-hybrid-retrieval

ClearCode

Part 3 of 3

A build-in-public series documenting the construction of ClearCode, a production-grade autonomous coding agent built from scratch. Every layer is covered - context indexing, agent reasoning, tool execution, memory, MCP integrations, safety, and evaluation - with the decisions, tradeoffs, and dead ends documented honestly along the way.

Start from the beginning

ClearCode Part 1: Reverse Engineering a Coding Agent Before Writing a Single Line of Code

Why I am building this I use Claude Code every day. For the longest time it felt like a black box. I type a prompt. Code appears. Files change. Tests run. Pull requests get written. I have no real ide