$catMANUAL||~49 min

AI Agents Can Finally Understand Your Codebase: codebase-memory-mcp Hands-On

advertisement

AI Agents Can Finally Understand Your Codebase: codebase-memory-mcp Hands-On

There's a project that's been blowing up on GitHub Trending — over 1,000 stars in a single day. It's called codebase-memory-mcp, and what it does is pretty straightforward: it makes AI coding agents actually understand your code structure instead of blindly grepping through files one by one.

I spent an afternoon tinkering with it, and honestly, this might be one of the most useful tools in the MCP ecosystem right now. Let me walk through what problem it solves, how to use it, and how it compares to other approaches.

The Problem: How AI Agents Read Code Is Painfully Dumb

If you've used Claude Code or Codex CLI, you know the feeling. You ask the agent to modify a function, and it starts grepping around, reading a bunch of files, grepping again, reading more files. A simple question takes 10+ rounds of tool calls and burns tens of thousands of tokens.

Why? Because agents read code nothing like humans do. When you look at code, you have structural intuition — you know the auth directory handles authentication, the api directory handles routing, and when you change a function you first check who calls it and what it calls. AI agents don't have this. They can only grep file by file, like searching for a book in a library while blindfolded.

This gets especially painful on large projects. On a 100K-line codebase, just figuring out "where is this function called" can take multiple rounds of interaction, each consuming tons of tokens. Understanding module dependencies? Forget about it.

I was working on a Node.js project with Claude Code once, trying to modify a middleware function. The agent searched for the function name, found the definition, read the file, then searched for who imports that module, read that file, looked at other functions in there... 8 rounds of tool calls, nearly 30K tokens burned, just to understand a simple call chain. I remember thinking: this is absurdly inefficient.

codebase-memory-mcp's approach is simple: since the agent doesn't understand code structure, build a knowledge graph that indexes the entire codebase — all functions, classes, and their relationships — so the agent can query the graph directly instead of grep-ing around blindly.

How It Works

Two core technologies: tree-sitter and knowledge graphs.

tree-sitter is an incremental parser that quickly converts source code into an AST (Abstract Syntax Tree). codebase-memory-mcp ships with tree-sitter grammars for 158 languages compiled directly into the binary — nothing extra to install.

After parsing the AST, it extracts all symbols (functions, classes, interfaces, enums, HTTP routes) and analyzes their relationships — who calls whom, who implements whom, who imports whom — and stores everything as a knowledge graph.

The graph lives in SQLite and supports Cypher query syntax (the same as Neo4j). Agents can query the graph directly to understand code structure instead of relying on grep-and-pray.

For example, say you want to trace the call chain for a handleLogin function:

  • Grep approach: search for handleLogin, find the definition file, read it, search for callers, read those files, search for callers of callers... probably 5-8 rounds of interaction
  • Knowledge graph approach: one trace_path call returns the complete call chain. 1 round. Done.

The official numbers: 5 structural queries consume about 3,400 tokens, vs 412,000 tokens for file-by-file search. That's a 120x difference. Not marketing fluff — real numbers.

Plus, tree-sitter parsing is precise. It doesn't rely on regex matching function names (which misses anonymous functions, arrow functions, overloads, etc.). It actually understands syntax structure. It can distinguish function handleLogin() from const handleLogin = () =>, and even recognizes the difference between class methods and standalone functions.

What Is Hybrid LSP?

tree-sitter handles structural analysis, but some information you can't get from the AST alone — like the actual type of a variable, or the return type of a function. That's where LSP (Language Server Protocol) comes in.

codebase-memory-mcp does something called Hybrid LSP: on top of tree-sitter parsing, it performs semantic type resolution for 9 languages — Python, TypeScript/JavaScript, PHP, C#, Go, C/C++, Java, Kotlin, and Rust. For example, if a function parameter has type User, tree-sitter only knows it's called User; Hybrid LSP can resolve where User is actually defined and what fields it has.

This is super helpful for understanding code. If you want to change an API handler's return value, Hybrid LSP tells you exactly where the return type is defined without having to dig through files manually.

Installation

Installation is surprisingly simple. One command:

bash
1
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash

It auto-detects which coding agents you have installed (Claude Code, Codex CLI, Gemini CLI, Aider, OpenCode, OpenClaw, Kiro, and 4 others — 11 total), then automatically configures MCP server entries, instruction files, and pre-tool hooks for each one. Restart your agent and you're good to go.

Want the graph visualization UI?

bash
1
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash -s -- --ui

After installation, open localhost:9749 in your browser for a 3D interactive graph visualization. It's pretty stunning the first time you see it — the entire codebase structure laid out with nodes and edges, all explorable.

The whole tool is a pure C static binary with zero dependencies. No Docker, no runtime, no API keys. Download, install, done. Gotta give props for this — so many MCP servers have dependency hell during setup.

Windows Users

PowerShell:

powershell
1
Invoke-WebRequest -Uri https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.ps1 -OutFile install.ps1
2
.\install.ps1

macOS and Linux support both arm64 and amd64. Windows supports amd64 only. The binary is about 30-50MB, way lighter than Docker images that run hundreds of megabytes.

Manual Configuration

If your agent isn't in the auto-detection list, you can configure it manually. For Claude Code, add to .claude/settings.json:

json
1
{
2
  "mcpServers": {
3
    "codebase-memory": {
4
      "command": "codebase-memory-mcp",
5
      "args": ["serve"]
6
    }
7
  }
8
}

Similar for other agents — check the docs for specifics.

The 14 MCP Tools

Once installed, your agent gets 14 new tools. Let me cover the important ones.

search_graph — Graph Search

Search graph nodes by label, name pattern, or file pattern. For example, find all functions named auth:

code
1
search_graph(label="Function", name="*auth*")

Or find all classes under src/api:

code
1
search_graph(label="Class", file="src/api/**")

Supports pagination via limit/offset, so large result sets don't dump everything at once.

trace_path — Call Chain Tracing

BFS traversal of call chains. Give it a function name, get back who calls it and what it calls. Depth is configurable from 1-5. This is the tool I use most.

code
1
trace_path(name="handleLogin", depth=3)

Returns a tree structure where each node has file path and line number. The agent can immediately understand a function's context without grepping.

detect_changes — Change Impact Analysis

Maps a git diff to affected symbols and computes the blast radius. Changed a function? Which upstream callers are affected? Which tests need to run?

This is incredibly useful for code review. You changed UserService.validate, and it tells you that AuthController.login, AuthController.register, and Middleware.checkSession all depend on this method and need careful review.

I once changed a utility function's return type, and detect_changes told me it would affect 12 downstream functions, 3 of which were API handlers. Without this tool, I probably would have missed some and only found out after deployment that an endpoint's response format changed.

get_architecture — Codebase Overview

Returns language distribution, package structure, HTTP routes, hotspots (most frequently modified files), and code clusters. Gives the agent a quick overview of the whole project.

I usually call this first when using an agent on a new project — it's like having the agent quickly "read" the project structure.

query_graph — Cypher Queries

Write Cypher query statements directly. This is the most flexible tool and can do way more than the preset tools.

Dead code detection — find all functions with no callers:

cypher
1
MATCH (f:Function)
2
WHERE NOT EXISTS { (f)<-[:CALLS]-() }
3
AND NOT EXISTS { (f)-[:HANDLES]->(:Route) }
4
RETURN f.name, f.file, f.line

Excluding HTTP route handlers (which have no explicit callers but get called by the framework), the rest are basically safe to delete.

Find circular dependencies:

cypher
1
MATCH (a:Module)-[:IMPORTS*2..6]->(a)
2
RETURN a.name

Find the most complex functions (called by the most other functions):

cypher
1
MATCH (f:Function)<-[:CALLS]-(caller)
2
RETURN f.name, count(caller) as callers
3
ORDER BY callers DESC
4
LIMIT 10

These queries are especially valuable on large codebases. Manual analysis might take days; a single Cypher query runs in milliseconds.

get_code_snippet — Read Source Code

Read a function's source code directly by qualified name. No need to search for the file first and then locate the line number — one step.

Qualified name format: <project>.<path_parts>.<name>, like myapp.src.auth.handlers.handleLogin. If you don't know the qualified name, use search_graph first.

manage_adr — Architecture Decision Records

Manage ADRs. Niche, but useful for team collaboration — you can record "why we chose this approach" and have it linked to code symbols in the graph.

Comparison With Other Approaches

There are roughly three categories of solutions for helping AI agents understand codebases.

File-Level Search (grep/ripgrep)

The most primitive approach, and what Claude Code and Codex CLI use by default. Zero config, works on any project. But extremely inefficient — agents need repeated searches to piece together code structure. Fine for "where is this function," painful for "what's the complete call chain of this function."

RAG (Retrieval-Augmented Generation)

Chunk the code, embed it, store in a vector database, do semantic search when the agent queries. GitHub Copilot Workspace and several code assistant tools use this approach.

The problem is that code isn't like natural language. Semantic search is bad at structural queries ("who calls X?", "what are Y's dependencies?"). Code "semantics" are call relationships, inheritance relationships, import relationships — not text similarity. Two functions with completely different names that are functionally complementary and always called together? Vector search can't discover that.

Plus RAG requires chunking code, and getting the chunk granularity right is hard. Too fine loses context, too coarse makes retrieval imprecise.

Knowledge Graphs (codebase-memory-mcp)

Uses tree-sitter for precise AST analysis, extracts the complete code structure into a graph. Queries traverse the graph, producing precise and complete results.

My personal take: for structural questions (call chains, dependency analysis, impact radius), the knowledge graph approach crushes the other two. For semantic questions ("what does this code do?"), the difference isn't huge — that's what LLMs are naturally good at.

There's an arXiv paper (2603.27277) with benchmarks across 31 real-world codebases: 83% answer quality, 10x fewer tokens, 2.1x fewer tool calls. Data speaks.

That said, RAG isn't useless. For natural language queries ("help me find code that handles user registration"), semantic search is sometimes more convenient than structural queries. Ideally, you'd use both — RAG for initial positioning, knowledge graph for precise analysis. Two-layer filtering gives you speed and accuracy.

One more thing worth noting: codebase-memory-mcp supports Infrastructure-as-Code indexing. Dockerfiles, Kubernetes manifests, Kustomize overlays — all parsed into graph nodes. Resource nodes represent K8s resources, Module nodes represent Kustomize overlays, connected by IMPORTS edges. If your project involves containerized deployment, this is genuinely useful.

Real-World Use Cases

Here are some scenarios I've actually used this in.

Scenario 1: Quickly Understanding an Unfamiliar Codebase

Taking over a new project with no idea how it's organized. Used to be: read directory structure, read README, poke around randomly. Now: have the agent call get_architecture, and in seconds you get language distribution, package structure, main routes, and hotspots. Ten times faster than manual exploration.

Especially for projects with terrible documentation, get_architecture is a lifesaver. It tells you which files are most frequently modified (indicating core logic) and which modules have tight dependencies.

Scenario 2: Impact Analysis

Want to change a public function but not sure what it affects. Used to rely on grep to find callers, but grep only does string matching — it misses indirect calls through aliases and re-exports. The knowledge graph traces the complete call chain, including cross-module indirect dependencies.

detect_changes goes further: it reads the git diff directly, tells you which symbols are affected, and what the risk level is for each. Super useful for PR reviews.

Scenario 3: Dead Code Detection

Every codebase has functions nobody calls. One Cypher query does it:

cypher
1
MATCH (f:Function)
2
WHERE NOT EXISTS { (f)<-[:CALLS]-() }
3
AND NOT EXISTS { (f)-[:HANDLES]->(:Route) }
4
RETURN f.name, f.file, f.line

Excluding HTTP route handlers and test functions, the rest are basically safe to delete. I ran this on a 50K-line project and found 47 uncalled functions — nearly 2,000 lines of code I could safely remove.

Scenario 4: Cross-Service Call Tracing

If your project has multiple microservices, codebase-memory-mcp can identify HTTP call relationships and chain them across services. The HTTP_CALLS edge type handles this.

For example, if Service A has axios.get('http://service-b/api/users'), the graph automatically creates an edge from Service A's function to Service B's /api/users route. You can see the cross-service call topology clearly.

Scenario 5: Understanding Test Coverage

The graph has a TESTS edge type linking test functions to the functions they test. You can query for functions without corresponding tests:

cypher
1
MATCH (f:Function)
2
WHERE NOT EXISTS { (f)<-[:TESTS]-() }
3
AND f.name STARTS WITH "handle"
4
RETURN f.name, f.file

Very helpful for improving test coverage.

Things to Watch Out For

A few gotchas I ran into.

Indexing large projects takes time. The Linux kernel (28M LOC) takes 3 minutes. Normal projects range from a few seconds to a few tens of seconds. Don't rush on the first use — wait for indexing to complete. Use CBM_DIAGNOSTICS=true to monitor progress.

Memory usage spikes during indexing. Everything gets loaded into memory for processing, then released after. For a 100K-line project, peak memory was around 1.5GB, dropping to ~50MB afterward.

SQLite storage. Indexes live in ~/.cache/codebase-memory-mcp/ using SQLite WAL mode. To re-index, just delete this directory. Indexes persist across agent restarts. But after code changes, you need to re-index to reflect them.

Auto-indexing. Run codebase-memory-mcp config set auto_index true to enable automatic re-indexing when the agent starts and detects code changes. Set auto_index_limit to cap the file count.

Not all languages get semantic analysis. tree-sitter parsing works for all 158 languages, but Hybrid LSP semantic type resolution only covers 9 languages (Python, TypeScript/JavaScript, PHP, C#, Go, C/C++, Java, Kotlin, Rust). Other languages get structural analysis only. But for most projects, these 9 cover 90%+ of the codebase.

Security. This tool reads your codebase and writes to agent config files. All processing is local — your code never leaves your machine. Every release is signed, checksummed, and scanned by 70+ antivirus engines. But if you're in a corporate environment, give your security team a heads-up since it does modify agent config files.

Custom file extensions. Some frameworks use special extensions like .blade.php for Laravel or .mjs for ES modules. Create a .codebase-memory.json in your project root:

json
1
{
2
  "extra_extensions": {
3
    ".blade.php": "php",
4
    ".mjs": "javascript"
5
  }
6
}

Compatible With 11 Agents

The install command auto-configures: Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code (via extension), OpenClaw, and Kiro. Pretty much all the major AI coding tools are covered.

FAQ

Q: How is this different from GitHub Copilot's codebase indexing?

Copilot's indexing is vector-based, good at semantic queries ("find code handling user registration") but mediocre at structural queries ("who calls X?"). codebase-memory-mcp uses AST-based graph analysis — structural queries are its strength. They complement each other.

Q: Is indexing slow?

Normal projects (a few thousand to tens of thousands of lines) take seconds to tens of seconds. The Linux kernel at 28M lines takes 3 minutes. The key is that indexing is one-time — incremental updates are fast.

Q: Does it support monorepos?

Yes. It auto-detects project structure, and each package in a monorepo gets indexed correctly. Cross-package import relationships are also tracked.

Q: Will it slow down agent startup?

Barely. The MCP server starts on demand — if the agent doesn't call the tools, it doesn't load. Index data lives in SQLite with memory-mapped queries, so latency is in the milliseconds.

Q: Is it open source?

Yes, MIT license. GitHub repo: DeusData/codebase-memory-mcp.

Q: Can I use it on private codebases?

Yes. All processing happens locally — your code is never sent to any external service. This matters a lot for companies that don't allow code outside their network.

Final Thoughts

codebase-memory-mcp made me think about a bigger idea: AI agents need more than the ability to read files — they need "code awareness." Just like human developers have a mental map of the codebase, agents need some form of structured code understanding.

The beauty of MCP is that this capability can be added modularly. Every coding agent doesn't need to implement code analysis themselves — one MCP server handles it, and all agents benefit. This is why the MCP ecosystem keeps growing: each tool does one thing well, and the agent orchestrates.

I'm planning to set up this MCP server on Hermes Agent too. If you're using MCP for coding, I'd strongly recommend giving it a try — especially on larger projects (tens of thousands of lines plus), where the efficiency difference is dramatic. On small projects, it might not feel that different since grep works fine. Drop any questions in the comments.

advertisement

AI Agents Can Finally Understand Your Codebase: codebase-memory-mcp Hands-On — AI Hub