$catMANUAL||~54 min

Terminal AI Coding Agents Showdown: Claude Code vs Codex CLI vs Gemini CLI — Real-World Comparison

advertisement

Terminal AI Coding Agents Showdown: Claude Code vs Codex CLI vs Gemini CLI — Real-World Comparison

A friend asked me last week, "What AI tool do you use for coding?" I told him I run agents in the terminal. He goes, "Wait, you don't use Cursor?"

Yeah, apparently a lot of people still don't know about terminal AI coding agents. Fair enough — they're relatively new, and most of the hype has been around IDE-integrated tools like Cursor and GitHub Copilot.

I used Cursor for about three months before switching to terminal agents, and honestly, I haven't looked back. It's not that Cursor is bad — it's great for what it does. But terminal agents fit my workflow like a glove. I've been a terminal power user for years (tmux + neovim + git is my daily driver), so switching to a GUI IDE always felt a bit awkward. Terminal agents let me stay in my natural habitat while getting serious AI assistance.

The terminal AI coding agent space is heating up fast. We've got Claude Code from Anthropic, Codex CLI from OpenAI, Gemini CLI from Google, and open-source options like OpenCode. Each has its own philosophy, strengths, and quirks.

Today I'm going to break down my real-world experience with these tools — the good, the bad, and the "why did it do THAT?" moments.

Why Terminal Agents Over IDE Plugins?

Let me address the elephant in the room first. Why would anyone choose a terminal agent over something like Cursor, which gives you nice autocomplete in a proper IDE?

The answer comes down to workflow integration.

Cursor is fundamentally an IDE. You work inside its window. That's fine if your entire development process happens in one editor. But in the real world, I'm constantly context-switching between terminals — running tests, checking logs, deploying to servers, managing git branches, SSHing into remote machines. Cursor can't help me with any of that.

Terminal agents, on the other hand, live right in your shell. They can execute commands, read and write files, search through codebases, run tests, and even handle deployments. They're not just "autocomplete on steroids" — they're actual coding partners that understand your entire development environment.

Are there tradeoffs? Absolutely. You lose the visual code highlighting and inline suggestions that make Cursor so pleasant for writing code. But for me, the ability to say "deploy this to staging and check if the tests pass" is worth way more than prettier autocomplete.

Meet the Contenders

Claude Code

Claude Code is Anthropic's terminal AI coding agent, powered by the Claude Sonnet 4 model. It's probably the most feature-complete terminal agent available right now.

Installation is straightforward:

bash
1
npm install -g @anthropic-ai/claude-code

Then just cd into your project and run:

bash
1
claude

It automatically reads your project structure, understands the codebase context, and you're ready to start chatting. The interaction feels natural — you describe what you want, and it goes off and does it. Need to refactor a module? Tell it. Got a weird error? Paste the stack trace. Want tests? Ask.

What really sets Claude Code apart is its Skills system. You can define custom behaviors for different scenarios using markdown files:

bash
1
# List existing skills
2
claude skills list
3
 
4
# Add a custom skill
5
claude skills add my-skill --file ./my-skill.md

I've set up a dozen or so Skills for my projects, covering everything from code review to deployment workflows. The effect is dramatic — instead of being a generic assistant, the agent becomes someone who knows your project's conventions, your coding style, and your team's processes.

Pricing: API-based, roughly $3 per million input tokens and $15 per million output tokens for Sonnet 4. A typical coding conversation costs about $0.05-$0.15. A heavy day of usage runs $2-$5. Claude Pro subscription ($20/month) gives you some discounts but API calls still cost extra.

OpenAI Codex CLI

Codex CLI is OpenAI's entry into the terminal agent space, open-sourced in May 2025. It runs on the o4-mini model and positions itself as a lightweight command-line coding assistant.

bash
1
npm install -g @openai/codex

Usage:

bash
1
codex "Write a Python script that reads a CSV file and counts null values in each column"

Codex CLI takes a different approach than Claude Code. Where Claude Code is an interactive conversation partner, Codex CLI is more of a "task executor" — you give it a task, it does it, done. There is an interactive mode, but the philosophy is more task-oriented.

bash
1
codex

The interactive mode works similarly to Claude Code, but feature-wise there's a gap. Codex CLI doesn't have a Skills system or the fine-grained behavior control that Claude Code offers. Its strength is speed — the o4-mini model is noticeably faster than Claude Sonnet 4, making it great for quick tasks.

Pricing: Free for ChatGPT Plus subscribers ($20/month) with usage limits. API pricing for o4-mini is about $1.10 per million input tokens and $4.40 per million output tokens — significantly cheaper than Claude.

Google Gemini CLI

Gemini CLI launched in June 2025, powered by Gemini 2.5 Pro. The big selling point? It's free. Completely free, no usage limits.

bash
1
npm install -g @google/gemini-cli

Usage:

bash
1
gemini

Gemini CLI's interface is clean and straightforward. Its coding capabilities aren't quite as strong as Claude Code or Codex CLI, but the free price tag is hard to argue with. Plus, it has Google's search capabilities baked in, so it can look up technical docs and Stack Overflow answers in real-time.

Pricing: Free. $0/day, $0/month. There are rate limits during peak hours, but for most users it's effectively unlimited.

Bonus: OpenCode

OpenCode is an open-source terminal agent that supports multiple LLM backends (OpenAI, Anthropic, Google, even local models). If you don't want to be locked into a single vendor, or you want to use your own deployed models, OpenCode is worth a look.

bash
1
go install github.com/opencode-ai/opencode@latest

OpenCode's TUI (terminal user interface) is built with Bubble Tea and looks quite polished. It supports multi-file editing, LSP integration, Git operations, and more.

The ecosystem is smaller than Claude Code or Codex CLI, but the flexibility is unmatched.

Installation Horror Stories

Let me share some of the pain I went through getting these tools set up. Maybe it'll save you some time.

Claude Code Gotchas

Claude Code needs Node.js 18+. Usually no problem. But on one of my Ubuntu 22.04 servers, I hit this:

code
1
Error: Cannot find module '@anthropic-ai/claude-code-linux-x64'

Turned out to be an npm cache issue. Fixed it with:

bash
1
npm cache clean --force
2
npm install -g @anthropic-ai/claude-code

Another gotcha: API key configuration. Claude Code supports multiple auth methods, but the simplest is:

bash
1
export ANTHROPIC_API_KEY="sk-ant-..."

The first time I used it, I spent 30 minutes debugging a connection timeout before realizing it was a network issue, not an API key problem. If you're behind a corporate proxy:

bash
1
export HTTPS_PROXY="http://proxy.company.com:8080"
2
claude

Codex CLI Gotchas

Codex CLI installed fine, but it requires Node.js 22+. My server had 20.x:

code
1
Error: codex requires Node.js >= 22

Quick fix with nvm:

bash
1
nvm install 22
2
nvm use 22

The bigger pain was authentication. Codex CLI needs an OpenAI API key, but (at least in the version I tested) it doesn't just read OPENAI_API_KEY from the environment. You're supposed to use:

bash
1
codex auth login

Which opens a browser. If you're on a headless server, that's a problem.

The workaround:

bash
1
codex --api-key-env OPENAI_API_KEY

But this parameter isn't easy to find in the docs. I had to dig through GitHub Issues to discover it.

Gemini CLI Gotchas

Gemini CLI was the easiest to install, but the auth flow was the most annoying. It requires Google Cloud authentication:

bash
1
gcloud auth application-default login

This also opens a browser. On a headless server, you need service account credentials:

bash
1
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

There's also a regional access issue. Gemini 2.5 Pro has restrictions in some regions. If you're in a restricted area, you might need to use Gemini 2.0 Flash instead:

bash
1
gemini --model gemini-2.0-flash

Or just use a proxy. Honestly, an API key would've been so much simpler.

Context Management: The Soul of Terminal Agents

One of the most important concepts when using terminal agents is the context window. Each tool handles context differently, and this directly impacts your experience.

Claude Code's Context Strategy

Claude Code has a 200K token context window — the largest among the three tools. But bigger doesn't mean you should dump everything in there. Larger context means higher API costs and slower responses.

Here's what I've found works well:

bash
1
# Put a CLAUDE.md file in your project root with project info
2
cat > CLAUDE.md << 'EOF'
3
 
4
# Project Overview
5
An e-commerce site built with Next.js 14, using PostgreSQL.
6
 
7
# Tech Stack
8
- Frontend: Next.js 14, Tailwind CSS, Zustand
9
- Backend: Next.js API Routes, Prisma ORM
10
- Database: PostgreSQL 16
11
- Deploy: Vercel + Supabase
12
 
13
# Code Conventions
14
- TypeScript, strict mode
15
- Functional components with hooks
16
- RESTful API style
17
- Vitest + Testing Library for tests
18
 
19
# Directory Structure
20
- src/app/ - Next.js App Router pages
21
- src/components/ - Reusable components
22
- src/lib/ - Utilities and config
23
- prisma/ - Database Schema
24
EOF

With this file, Claude Code reads it on every startup — no need to explain your project every time. Brilliant design.

Claude Code also supports .claudeignore (similar to .gitignore) to exclude files the agent shouldn't look at:

code
1
# .claudeignore
2
node_modules/
3
.next/
4
dist/
5
* .min.js
6
* .min.css

This prevents the agent from wasting time reading minified files in node_modules.

Codex CLI's Context Strategy

Codex CLI's context window is smaller, but it has a neat feature: automatic git awareness. It reads your recent git changes and understands what you're currently working on.

bash
1
# Codex CLI auto-detects your git state
2
 
3
# If you're on a feature branch, it knows you're building something new
4
codex "help me finish this feature"

But Codex CLI doesn't have a CLAUDE.md equivalent. You have to provide project context manually in each conversation.

Gemini CLI's Context Strategy

Gemini CLI's context management is more basic. It has a large context window (1M tokens), but no automatic project awareness. You need to re-explain your project each time you start a new conversation.

However, Gemini CLI has a unique advantage: real-time search. If it doesn't know about a library you're using, it automatically searches for documentation. This is incredibly useful for newly released libraries.

bash
1
# Gemini CLI auto-searches latest docs
2
gemini "help me build a DataTable component using the latest shadcn/ui"

It'll look up the latest shadcn/ui documentation and give you code based on the newest API. Claude Code and Codex CLI can't do this — their training data has a cutoff date.

Plugin and Extension Ecosystem

Another critical dimension is extensibility. A tool can't built-in everything — how easily you can extend it determines how far it can go.

Claude Code's Ecosystem

Claude Code has the strongest extension capabilities, primarily through its Skills system. Skills are markdown files that define agent behavior for specific scenarios.

The community has already created many useful Skills:

  • code-review: Automated code review workflow
  • tdd: Test-driven development, forces tests before code
  • debug: Systematic debugging with a 4-step approach
  • refactor: Safe refactoring with tests at each step
  • security-scan: Security vulnerability scanning

I've written quite a few Skills myself, including a "Git Workflow" skill:

markdown
1
# git-workflow
2
 
3
## Branch Naming
4
- feature/xxx - New features
5
- fix/xxx - Bug fixes
6
- refactor/xxx - Refactoring
7
- docs/xxx - Documentation
8
 
9
## Commit Messages
10
Use Conventional Commits:
11
- feat: New feature
12
- fix: Bug fix
13
- docs: Documentation
14
- refactor: Refactoring
15
- test: Tests
16
- chore: Build/tooling
17
 
18
## PR Process
19
1. Create feature branch from main
20
2. Run tests after development
21
3. Ensure lint passes
22
4. Submit PR with description
23
5. Address review feedback
24
6. Delete branch after merge

With this skill, the agent automatically follows these conventions when doing Git operations.

Codex CLI's Ecosystem

Codex CLI doesn't have a formal plugin system. Extension is mainly through prompt engineering — you tell it how to behave in your conversations.

Codex CLI is open-source though, so you could theoretically modify the code. But realistically, most developers won't go that far.

Gemini CLI's Ecosystem

Gemini CLI also lacks a formal plugin system, but it supports MCP (Model Context Protocol). Through MCP, you can connect Gemini CLI to various external tools and data sources.

json
1
{
2
  "mcpServers": {
3
    "github": {
4
      "command": "npx",
5
      "args": ["-y", "@modelcontextprotocol/server-github"],
6
      "env": {
7
        "GITHUB_TOKEN": "ghp_xxxx"
8
      }
9
    }
10
  }
11
}

With MCP configured, Gemini CLI can directly interact with GitHub — creating issues, reviewing PRs, searching code. This is a capability that Claude Code and Codex CLI don't have built-in.

Performance Benchmarks: How Fast Are They Really?

I ran a simple performance test: asked each agent to generate the same React component (a user list with search and pagination), and measured the time from request to complete response.

Test Environment

  • Network: 100Mbps, US West Coast server
  • Test time: Off-peak hours (to avoid rate limiting)
  • Each tool tested 5 times, averaged

Results

Claude Code

  • Average response time: 12.3 seconds
  • First token latency: 1.2 seconds
  • Output token speed: ~85 tokens/second
  • Generated code lines: ~180

Codex CLI

  • Average response time: 6.8 seconds
  • First token latency: 0.8 seconds
  • Output token speed: ~150 tokens/second
  • Generated code lines: ~150

Gemini CLI

  • Average response time: 9.5 seconds
  • First token latency: 1.5 seconds
  • Output token speed: ~95 tokens/second
  • Generated code lines: ~160

Codex CLI's speed advantage is clear, thanks to the lightweight o4-mini model. Claude Code is slower but produces the highest quality, most complete code. Gemini CLI sits in the middle.

Note: these numbers are affected by network conditions, server load, and time of day. My test is a reference point, not absolute truth.

Error Handling: How Well Can They Debug?

I tested a scenario where I deliberately gave each agent buggy code and watched whether they could identify and fix the problems.

Test Code

python
1
def calculate_average(numbers):
2
    total = 0
3
    for num in numbers:
4
        total += num
5
    return total / len(numbers)  # ZeroDivisionError if empty
6
 
7
def get_fibonacci(n):
8
    if n <= 0:
9
        return []
10
    elif n == 1:
11
        return [0]
12
    fib = [0, 1]
13
    for i in range(2, n):
14
        fib.append(fib[i-1] + fib[i-2])
15
    return fib

Claude Code's Response

Claude Code immediately identified the ZeroDivisionError risk and suggested a comprehensive fix:

python
1
from typing import List
2
 
3
def calculate_average(numbers: List[float]) -> float:
4
    """Calculate the average of a list of numbers.
5
 
6
    Args:
7
        numbers: List of numbers
8
 
9
    Returns:
10
        The average value
11
 
12
    Raises:
13
        ValueError: If the list is empty
14
    """
15
    if not numbers:
16
        raise ValueError("Cannot calculate average of empty list")
17
    return sum(numbers) / len(numbers)
18
 
19
def get_fibonacci(n: int) -> List[int]:
20
    """Generate the first n Fibonacci numbers.
21
 
22
    Args:
23
        n: Number of elements to generate
24
 
25
    Returns:
26
        Fibonacci sequence
27
    """
28
    if n <= 0:
29
        return []
30
    elif n == 1:
31
        return [0]
32
    fib = [0, 1]
33
    for i in range(2, n):
34
        fib.append(fib[i-1] + fib[i-2])
35
    return fib

Not only did it fix the bug, it added type hints, docstrings, and proper exception handling. Professional quality.

Codex CLI's Response

Codex CLI also found the ZeroDivisionError, but didn't suggest the additional improvements (type hints, docstrings). It focused purely on the bug fix. Practical and to the point.

Gemini CLI's Response

Gemini CLI found the issue but also incorrectly flagged the second function as buggy, claiming the loop wouldn't execute when n=1. That's actually correct behavior — when n=1, it should return [0] without entering the loop. I had to explain this before it understood. Shows that Gemini CLI's code semantic understanding still has room for improvement.

Cost Breakdown: What Does a Day of Usage Cost?

Based on my typical usage (20-30 conversations per day, averaging 5 turns each):

Claude Code

  • Average per conversation: ~2000 input + 1000 output tokens
  • Daily cost: ~$0.63
  • Monthly (22 working days): ~$14
  • With Claude Pro ($20/month): total ~$30-$35/month

Codex CLI

  • o4-mini is significantly cheaper
  • Daily cost: ~$0.15
  • Monthly: ~$3.30
  • With ChatGPT Plus ($20/month): essentially free (with limits)

Gemini CLI

  • Free. $0/day, $0/month.

Summary

  • Cheapest: Gemini CLI (free)
  • Best value: Codex CLI (for ChatGPT Plus users)
  • Most expensive but most capable: Claude Code (~$30-$35/month)

Which One Should You Pick?

Professional Developer, Values Efficiency

Go with Claude Code. The code quality is highest, the Skills system lets you customize workflows, and the automation capabilities are unmatched. Yes, it costs more, but if your time is valuable (and it is), the investment pays for itself.

Student or Hobbyist, Budget-Conscious

Go with Gemini CLI. Free is free. The coding capabilities aren't as strong, but for learning and personal projects, it's more than adequate. The search integration is genuinely useful for looking up documentation.

Already a ChatGPT Plus Subscriber

Go with Codex CLI. You're already paying $20/month — might as well use it. Codex CLI is fast and capable for everyday coding tasks.

Want Maximum Flexibility

Go with OpenCode. Multiple LLM backends, your own API keys, even local models. Perfect for tinkerers.

Corporate Environment with Data Security Requirements

If your code can't leave your infrastructure, you need a solution that supports local models. OpenCode + Ollama + CodeLlama is one option, though the quality will be lower. Enterprise versions of Claude Code and Codex CLI both have data-not-used-for-training policies, but check your company's specific requirements.

Pro Tips from the Trenches

After months of daily usage, here are the patterns that work best:

Give Rich Context

Agents aren't mind readers. The more context you provide, the better the output.

bash
1
# ❌ Vague
2
claude "this function has a bug"
3
 
4
# ✅ Specific
5
claude "The parse_config function in src/utils/parser.py throws 'RecursionError: maximum recursion depth exceeded' when processing nested JSON. Test file is tests/test_parser.py."

Use Agents for Code Review

This is one of my most common use cases. Before submitting a PR:

bash
1
claude "review my recent changes, focus on security issues and performance"

It reads the git diff, analyzes your changes, and gives detailed review comments. Often catches things I missed.

Let Agents Write Commit Messages

bash
1
claude "write a commit message based on my recent changes"

It analyzes the diff and generates a conventional commit message. Better than what I'd write myself, honestly.

Batch Operations: Be Careful

Agents can modify files in bulk, but this power deserves respect. I once asked an agent to "unify the code style across the entire project." It modified 200+ files and introduced about a dozen bugs.

Lesson learned: test batch operations on a small scope first, then expand.

What's Next?

Terminal AI coding agents are just getting started. I see a few trends coming:

  1. Multi-agent collaboration: Not one agent doing everything, but specialized agents working together — one for coding, one for testing, one for review.
  2. Deeper tool integration: Direct integration with CI/CD, monitoring systems, and cloud platforms.
  3. Personalization: Agents that learn your coding style and preferences over time.
  4. Local-first: As local models improve, more agents will support fully offline operation, solving data security concerns.

Getting into terminal AI coding agents now is like getting into Docker in 2015 — it looks like a niche tool today, but it's going to be standard practice soon.

Final Thoughts

I actually use all three tools. Claude Code for complex development tasks, Codex CLI for quick one-off jobs, and Gemini CLI for documentation lookup and research. They're not competing — they're complementary.

If you haven't tried terminal AI coding agents yet, I strongly recommend giving one a spin. Doesn't matter which one you pick — it'll level up your development workflow.

I'm planning to write a follow-up about creating custom Skills for Claude Code, since the customization capabilities are genuinely game-changing. Drop a comment if you have questions or want to share your own experience.

advertisement

Terminal AI Coding Agents Showdown: Claude Code vs Codex CLI vs Gemini CLI — Real-World Comparison — AI Hub