Claude Agent SDK: Building AI Agents That Actually Do Stuff
I spent a day messing around with Anthropic's new Claude Agent SDK, and honestly, it's a different beast from LangChain, CrewAI, or anything else I've tried. In short, it packages up Claude Code's core abilities — reading files, running commands, editing code, searching the web — into a programmable library. You don't implement tool calling logic yourself. The SDK handles it.
This post covers my hands-on experience, the gotchas I hit, and how it stacks up against other agent frameworks. If you're thinking about building automation with Claude, this should save you some time.
What Is Claude Agent SDK
Let me be clear about what it's NOT. It's not another LangChain. It's not another CrewAI. It's not another AutoGen.
The positioning is straightforward: it exposes Claude Code's capabilities as a library. Claude Code itself is a terminal-based AI coding assistant that can read code, edit code, run tests, and search the web. The Agent SDK wraps all that in Python or TypeScript so you can call it from your own programs.
The key difference? Frameworks like LangChain require you to define tools yourself, then let the LLM decide which to call. Agent SDK ships with tools built in. Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch — all ready to go. You don't write def read_file(path): .... It just works.
What does that mean in practice? Building an agent that automatically fixes bugs might take 20 lines of code.
Installation and Quick Start
Installation is simple. Python needs 3.10+, TypeScript needs Node.js 18+.
For Python:
| 1 | |
Or with uv (recommended, faster):
| 1 | |
For TypeScript:
| 1 | |
You'll need an Anthropic API key. Sign up at platform.claude.com.
First mistake I made: forgot to set the environment variable and got an auth error. Make sure to export it:
| 1 | |
Minimal Working Example
Create a buggy Python file utils.py:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
Both functions have issues: calculate_average([]) divides by zero, get_user_name(None) throws a TypeError.
Then write the agent code agent.py:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
Run it:
| 1 | |
You'll see the agent automatically read utils.py, analyze the logic, and edit the file to add error handling. No manual intervention needed.
After it finishes, utils.py looks like this:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
I'll be honest — the first time I saw this, I was impressed. Not because Claude can fix bugs (any LLM can do that), but because it decides on its own which file to read, what to change, and how to fix it. The entire tool loop is orchestrated by the SDK behind the scenes. You just provide a prompt.
Built-in Tools Breakdown
The built-in tools are the biggest selling point. No custom implementation needed:
- Read: Read any file in the working directory
- Write: Create new files
- Edit: Precise edits to existing files (not full rewrites, targeted edits)
- Bash: Run terminal commands, scripts, git operations
- Monitor: Watch a background script and react to each output line as an event
- Glob: Find files by pattern (like
**/*.ts,src/**/*.py) - Grep: Search file contents with regex
- WebSearch: Search the web for current information
- WebFetch: Fetch and parse web page content
- AskUserQuestion: Ask the user clarifying questions with multiple choice options
These tools aren't just for show. Real example — I had the agent find all TODO comments in a project:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
The agent uses Grep to search for TODO, then Read to open each file and confirm context. Fully automatic.
The Power of Tool Combinations
Individual tools are boring. Combinations are where it gets interesting. Example: adding type annotations to a Python project.
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
The agent follows this flow step by step: find files, read them, add type hints, run mypy, fix errors. If mypy reports new errors, it iterates automatically. This loop is managed by Agent SDK — you don't write a while loop.
Permission Control: Don't Let the Agent Nuke Your Server
This is the part I'd take seriously. The agent has a Bash tool, meaning it can run arbitrary commands. Without controls, it could theoretically execute rm -rf / (low probability, but not zero).
Agent SDK provides several permission modes:
- acceptEdits: Auto-approves file edits and common filesystem commands, asks for everything else. Good for local development.
- dontAsk: Denies anything not in
allowed_tools. Good for headless agents. - auto (TypeScript only): Uses a model classifier to approve or deny each tool call. Autonomous agents with safety guardrails.
- bypassPermissions: Approves everything. Only use in sandboxed or fully trusted environments.
- default: Requires a
canUseToolcallback for custom approval flows.
I started with bypassPermissions because it was easy. Then the agent executed some Bash commands I didn't expect (it decided to pip install a package on its own). Switched to acceptEdits immediately.
My recommendation: Use acceptEdits for local dev, bypassPermissions for CI/CD (only if the CI environment is isolated), and default + custom callbacks for production.
Fine-Grained Control with allowed_tools
Beyond permission modes, you can use allowed_tools to control exactly which tools the agent can access:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
My personal rule: least privilege. Give Read only if that's all you need. Add Bash only when you need to run tests, and pair it with Hooks for auditing (more on that below).
Hooks: Agent Lifecycle Callbacks
Hooks are another killer feature. They let you insert your own code at key points during agent execution.
Available hook events:
- PreToolUse: Before a tool call. Can intercept, modify arguments, or deny.
- PostToolUse: After tool execution. Can log, modify return values.
- Stop: When the agent stops executing. Can save state.
- SubagentStart / SubagentStop: Subagent lifecycle events.
- Notification: Agent status messages. Can forward to Slack or other notification systems.
Practical Example: Blocking Dangerous Operations
I wrote a Hook to prevent the agent from editing .env files (to avoid leaking secrets):
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
| 18 | |
| 19 | |
| 20 | |
When the agent tries to edit .env, it gets blocked with a friendly rejection message.
Practical Example: Logging All File Changes
Audit requirements are common — you want to know exactly what the agent changed:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
This logs every file modification to audit.log with tool name, file path, and call ID. Useful for debugging.
Subagents: Let the Agent Delegate Work
Subagents are an advanced feature. The main agent can spawn subagents for subtasks, and subagents report back with results.
This is similar to Hermes Agent's delegate_task. The main agent handles task decomposition and result aggregation; subagents handle execution.
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
| 18 | |
| 19 | |
| 20 | |
| 21 | |
| 22 | |
| 23 | |
| 24 | |
| 25 | |
The main agent dispatches code-reviewer to review code, gets the results, then dispatches test-writer to write tests. Automatic orchestration.
Subagent Gotchas
One gotcha I hit: subagent allowed_tools must be configured separately. I forgot to add Bash to the test-writer subagent, so it wrote tests but couldn't run them. Subagents don't inherit the main agent's tool list — you must specify explicitly.
Another note: subagent messages include a parent_tool_use_id field for tracking which subagent is doing what. Very useful for debugging.
Sessions: Maintaining Context Across Calls
By default, each query() call is a fresh conversation. Some scenarios need context across multiple calls — like a long-running task that might need to pause and resume.
Agent SDK's Sessions solve this:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
| 18 | |
On the second call, the agent remembers what it analyzed before. No need to re-read files. Session data is stored as JSONL on the local filesystem.
This is particularly useful for multi-turn conversational agents. Think of a code assistant where the user asks things in sequence: "check this file" → "modify that function" → "run the tests". Each turn needs context from the previous one.
MCP Integration: Connecting External Systems
If you've used MCP (Model Context Protocol), you know it lets AI connect to various external tools and data sources. Agent SDK has native MCP support:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
The agent can now use additional tools from MCP servers. The memory server provides persistent memory, the filesystem server provides safe filesystem access.
I tried connecting a PostgreSQL MCP server to have the agent query databases and generate reports. Worked well, but one gotcha: MCP tool names follow the pattern mcp__<server>__<action>. When matching in Hooks, use regex ^mcp__ instead of exact matches.
Comparison with Other Agent Frameworks
After using it for a while, here's how Agent SDK stacks up:
Agent SDK vs LangChain
LangChain is a general-purpose agent framework. You define tools yourself, write prompt templates, manage memory. Its strength is flexibility — supports nearly every LLM. Its weakness is that it's TOO flexible — the number of configuration options is overwhelming.
Agent SDK is Claude-exclusive. No custom tool definitions needed (built-in), no prompt templates (Claude knows how to use tools), no memory management (Sessions handle it). The downside: locked to Claude, can't swap to GPT or Gemini.
When to choose: If Claude is your primary model, Agent SDK's developer experience is far better than LangChain. If you need multi-model switching, LangChain is more suitable.
Agent SDK vs CrewAI
CrewAI is a multi-agent collaboration framework centered on "roles" and "tasks." You define agent roles (like "researcher," "writer," "editor") and assign tasks for them to collaborate on.
Agent SDK also has Subagents, but the approach is different. CrewAI's agents collaborate as peers; Agent SDK uses a master-subordinate pattern — the main agent assigns tasks, subagents execute and report back.
When to choose: Complex multi-role collaboration flows, go with CrewAI. One main agent with helpers, go with Agent SDK.
Agent SDK vs OpenAI Agents SDK
OpenAI also has its own Agent SDK (evolved from Swarm). Both have similar design philosophies — built-in tools, streaming output, handoff support.
The differences: Agent SDK has Claude Code's full tool suite (file operations, code editing, Bash), while OpenAI's leans more toward API calls and function execution. Agent SDK has Hooks and Sessions; OpenAI's is more lightweight.
When to choose: Code-related agents (auto bug-fixing, code review, automated testing), Agent SDK is stronger. API orchestration and business process automation, OpenAI Agents SDK is cleaner.
Real-World Use Cases
After a day of tinkering, here's where I think Agent SDK shines:
1. Automated Code Review in CI/CD
Run an agent in GitHub Actions to automatically review PRs:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
2. Automated Test Generation
Given a source file, automatically generate corresponding tests:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
3. Codebase Documentation Generation
Have the agent traverse the entire project and auto-generate README and API docs:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
4. Data Processing Pipelines
Agents don't just write code — they run it. You can have them handle data processing:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
Gotchas and Lessons Learned
Gotcha 1: API Version Compatibility
Claude Opus 4.7 changed thinking.type.enabled to thinking.type.adaptive. Older SDK versions throw this error with Opus 4.7:
| 1 | |
Fix: upgrade to Agent SDK v0.2.111+.
Gotcha 2: Python and TypeScript SDK Feature Parity
Some Hook events only exist in the TypeScript version. SessionStart, SessionEnd, MessageDisplay, PostToolBatch — Python doesn't support these yet.
If you need these advanced Hook features, you might have to use the TypeScript version. Or wait for Python to catch up.
Gotcha 3: Default Permission Mode Behavior
When you don't specify permission_mode, the SDK uses default mode, which requires a canUseTool callback. If you don't provide one, the agent hangs after reading a file, waiting for permission. First time I ran into this, I waited for ages before realizing it was waiting for a permission callback.
Gotcha 4: Subagent Tool Inheritance
Mentioned above — subagents don't inherit the main agent's allowed_tools. Must configure separately.
Gotcha 5: Session File Cleanup
Session data accumulates as JSONL files on the local filesystem. Clean up regularly, or use the SessionEnd Hook for automatic cleanup.
Deploying to Production
Local experimentation is fine, but the real value is production deployment. Agent SDK supports multiple deployment options:
Docker
Simplest approach — package as a Docker image:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
Note: agents in Docker containers can only access the container's filesystem. Mount volumes if you need host filesystem access.
CI/CD Integration
Using Agent SDK in GitHub Actions:
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
Relationship with Managed Agents
Anthropic also offers Managed Agents — a hosted REST API where Anthropic runs the agent and sandbox for you. Agent SDK runs in your own process; Managed Agents run on Anthropic's infrastructure.
The official recommended path: prototype locally with Agent SDK, deploy to Managed Agents for production. That way you don't manage infrastructure yourself.
Wrapping Up
Claude Agent SDK isn't everything to everyone, but within its sweet spot — building Claude-based automation agents — it's the most hassle-free option I've used. Built-in tools eliminate tons of boilerplate, Hooks provide sufficient control, and Sessions solve context persistence.
Compared to LangChain, it's less flexible but way more productive. Compared to CrewAI, it's not suited for complex multi-role collaboration, but the single-agent-plus-subagents pattern covers most scenarios.
If you already use Claude Code, there's almost zero learning curve — they share the same tools and concepts. If you haven't used Claude Code, Agent SDK is a great entry point.
Next I'm planning to build an automated code review bot with Agent SDK, hooked into GitHub PR webhooks. I'll write that up when it's done. Questions? Drop them in the comments.
- Written June 2026, based on the latest Claude Agent SDK. The SDK updates frequently — check the official docs for the latest.*