$catMANUAL||~36 min

Claude Now Requires Identity Verification — Can Open Models Fill the Gap? A Week of Real Testing

advertisement

Claude Now Requires Identity Verification — Can Open Models Fill the Gap? A Week of Real Testing

Last week I opened Claude to write some code and got hit with a verification page — asking me to upload a photo of my government ID. I've been using Claude for almost two years. This was a first.

Turns out Anthropic started rolling out identity verification in mid-June 2026, using a third-party service called Persona. Don't verify? Some features get locked. Even worse — people are reporting account bans after completing verification.

The Hacker News thread hit 777 points and the comments were on fire. Some people called it "the Linux moment for LLMs." Others said "time to switch to open models." I got nervous too — Claude Code is my daily driver for coding. What if I suddenly can't use it?

So I spent a week seriously testing several mainstream open models in real coding scenarios. The short answer: open models can handle about 80% of what I need. But that remaining 20% gap is real, and you need to know what you're giving up.

Why Anthropic Is Doing This

The official line is "being responsible with powerful technology starts with knowing who is using it." Translation: people are doing bad things with Claude and they need to crack down.

What kind of bad things? Probably:

  • Large-scale automated content scraping and spam generation
  • Bypassing safety restrictions for malicious purposes
  • Enterprise compliance pressure (especially from the EU AI Act)

Whatever the justification, it's one more barrier for regular developers. You're handing over your ID photo to a company you've probably never heard of. The privacy concerns are legitimate.

What's more unsettling is that some people report getting banned even after verifying. Anthropic says bans are for "safety violations," but the definition of what counts as a violation is frustratingly vague.

This reminds me of when GitHub Copilot first launched and people said "your code is on Microsoft's servers, that's not safe." Back then most people dismissed it as paranoia. Turns out the risk of depending on a single closed-source provider is very real. One policy change can shut down your entire workflow.

There's a popular article on HN (246 points) by Andrew Marble who's actively switching from Claude to open models. His analogy is spot-on: using Linux used to carry professional risk because of compatibility issues and a weaker software ecosystem. But nowadays Linux isn't really a sacrifice anymore. Open models are the same — the gap is narrowing, and "the weights you download can't be taken away from you."

Where Open Models Actually Stand

Let me cut to the chase: as of mid-2026, the gap between open and closed models has shrunk from "unusable" to "occasionally rough around the edges."

I focused on three models:

DeepSeek V4: I've written about this before — used it with Claude Code for a week. Coding ability is strong, especially for Python and TypeScript. Daily development work is mostly fine. The downsides: smaller effective context window, occasional information loss on long files, and it sometimes randomly switches from Chinese to English mid-response.

Qwen 3 Series: Alibaba's models, ranging from 0.6B to 235B parameters. I mainly tested the 30B and 235B versions. The 30B runs locally on a 24GB GPU and is fast enough for simple coding tasks. The 235B needs API access and has similar coding quality to DeepSeek V4, though slightly weaker on complex reasoning.

GLM-5.2: Zhipu's latest model, MIT licensed. Someone pitted it head-to-head against Claude Opus 4.8 — building a WebGL 3D platformer game from scratch. GLM-5.2 took 1 hour 10 minutes, Opus took 33.5 minutes. But GLM-5.2 cost $5.39 while Opus was estimated at $21.92. A quarter of the price for double the time — that's a trade-off most individual developers can accept.

The pricing tells the real story:

  • Claude Opus 4.8: $5/M input tokens, $25/M output tokens
  • GLM-5.2: $1.4/M input, $4.4/M output
  • DeepSeek V4: $0.27/M input, $1.1/M output (even lower with cache hits)

DeepSeek costs one-twentieth of Opus. If you're doing heavy AI-assisted coding every day, that's hundreds of dollars a month in savings just on API costs.

Beyond these three main contenders, there are a few others worth watching:

Llama 4 Maverick: Meta's latest open model with 400B parameters using a Mixture of Experts architecture, so only a subset of parameters activate during inference. Coding ability is similar to DeepSeek V4, but stronger for English tasks.

Mistral Large 2: France's Mistral flagship at 123B parameters. Good European privacy compliance if you have GDPR requirements. Coding ability is decent but trails DeepSeek and GLM on complex reasoning.

What It Actually Feels Like in Practice

Benchmarks and pricing are nice, but what matters is real work. I tested several common scenarios:

Building a Complete FastAPI Backend

I asked each model to create a full FastAPI project with user authentication and database CRUD from scratch. This tests project structure understanding and code organization.

Claude Opus gave clean, well-organized code with pytest tests included. DeepSeek V4 produced working code but with a messier directory structure and lazy test coverage. GLM-5.2 fell somewhere in between.

The gap wasn't huge — maybe 10%. If you have experience yourself, a few tweaks get you there.

One thing I noticed: DeepSeek set JWT expiration to 30 days in the auth module, which is a no-go for production. Claude would typically default to something more reasonable (like 1 hour) or at least remind you to adjust based on your use case. This kind of "security awareness" gap is something open models need to work on.

Debugging a Complex TypeScript Type Error

Given a file with nested generic type errors, can the model find and fix the issue?

Claude nailed it on the first try with a clear explanation. DeepSeek found the problem but got the fix wrong on the first attempt — needed a second round. GLM-5.2 was similar.

The gap was more noticeable here. Claude's "understanding" is genuinely stronger, especially with complex type systems. It grasps the generic constraints you actually want, while DeepSeek sometimes gives you a fix that compiles but has the wrong semantics.

Refactoring 2000 Lines of Legacy Code

This was the biggest gap. Given a real old project, asking for module splitting, type additions, and readability improvements.

Claude handled the entire file in one conversation while maintaining consistency. DeepSeek started "forgetting" earlier changes halfway through — renaming user to userData in one place, then using user again elsewhere, causing runtime errors.

Long-context handling is genuinely the weak point of open models. While DeepSeek claims 128K context support, the effective length is much shorter. In my experience, files over 15K tokens start causing instability.

Writing Unit Tests

Surprisingly, open models did well here. DeepSeek's test coverage was comparable to Claude's, sometimes even catching edge cases Claude missed. Test writing is relatively "mechanical" and doesn't require much creative reasoning, which might explain why open models perform well.

If your main need is "write tests for this function," open models are perfectly adequate and much cheaper.

Explaining Complex Code

Given code using advanced patterns (decorator chains, metaclasses, higher-order function composition), can the model explain what it does?

Claude's explanations were clearest, using analogies and mental models. DeepSeek's explanations were usable but dry, like reading documentation. This is a "teaching ability" gap — Claude is a better teacher, DeepSeek is more like a reference manual.

How to Connect Open Models to Your Coding Tools

Setting up open models with mainstream coding tools is straightforward.

Option 1: Aider + DeepSeek V4

Aider is an open-source terminal AI coding assistant that supports almost every model:

bash
1
# Install Aider
2
pip install aider-install
3
aider-install
4
 
5
# Use DeepSeek V4
6
aider --model deepseek/deepseek-chat --api-key sk-xxx
7
 
8
# Use GLM-5.2 via OpenRouter
9
aider --model openrouter/zhipu/glm-5.2

Aider automatically manages git commits — every AI change gets committed, making rollups easy. It also supports "architect mode" where a strong model plans and a cheaper model executes.

Option 2: Cursor / Windsurf API Switch

In Cursor settings, swap the model API:

code
1
Settings → Models → Add Model
2
API Base URL: https://api.deepseek.com/v1
3
API Key: sk-xxx
4
Model: deepseek-chat

After this, Cursor's Tab completion, Chat, and Composer all use the new model. The experience is similar to Claude, though response speed may be slightly slower.

Option 3: Local Execution (Qwen 3 30B)

If you have a 24GB GPU (like a 4090 or 3090), you can run Qwen 3 30B with Ollama:

bash
1
# Install Ollama
2
curl -fsSL https://ollama.com/install.sh | sh
3
 
4
# Download model (~20GB)
5
ollama pull qwen3:30b
6
 
7
# Start service (default port 11434)
8
ollama serve

Point Cursor or Aider to http://localhost:11434. The benefit is fully offline — data never leaves your machine. The downside is speed: about 30-40 tokens/second on a 4090, noticeably slower than API access.

Option 4: OpenRouter Unified Access

OpenRouter is a model aggregation platform — one API key gives access to dozens of models:

python
1
import openai
2
 
3
client = openai.OpenAI(
4
    base_url="https://openrouter.ai/api/v1",
5
    api_key="sk-or-xxx"
6
)
7
 
8
response = client.chat.completions.create(
9
    model="deepseek/deepseek-chat",
10
    messages=[{"role": "user", "content": "Write a quicksort"}]
11
)

OpenRouter typically costs 10-20% more than direct API access, but the convenience is worth it if you switch between models frequently.

The Cost Math: How Much Can You Actually Save

Assume 4 hours of daily AI coding, 50K tokens per conversation (input + output), about 20 conversations per day.

Claude Opus:

  • 1M tokens/day: ~500K input + 500K output
  • Daily: $2.50 (input) + $12.50 (output) = $15
  • Monthly: ~$450

DeepSeek V4:

  • Same token volume
  • Daily: $0.135 + $0.55 = $0.685
  • Monthly: ~$20.50

GLM-5.2:

  • Daily: $0.70 + $2.20 = $2.90
  • Monthly: ~$87

The gap is obvious. DeepSeek costs $20/month versus Claude's $450. Even with occasional Claude usage for complex tasks, you can keep total costs under $50/month.

My current hybrid setup: Claude Pro subscription ($20) + DeepSeek API ($20) + occasional OpenRouter ($10) = ~$50/month. That's about 90% savings from the $450 all-API approach.

The Real Shortcomings of Open Models

I've talked up open models a lot, but the gaps are real and not something you can fix with prompt engineering.

Multimodal Capabilities

Claude Opus can look at screenshots, analyze UI, understand diagrams. GLM-5.2 is text-only. DeepSeek V4 has a multimodal version but it's nowhere near Claude's level.

If your workflow involves "screenshot and ask AI" (like having it read error screenshots or analyze design mockups), open models can't replace that yet.

Agent Mode Stability

Claude Code's Agent mode can run for dozens of minutes, automatically planning, executing, and verifying. Open models tend to go off-track on long task chains — not because they lack capability, but because they lack sustained "focus."

Sakana AI just released Fugu, a system that uses multiple models collaborating on complex tasks. Fugu Ultra scored 73.7 on SWE-Bench Pro, beating Opus 4.8's 69.2. This suggests "model collaboration" might be more effective than "a single stronger model."

This applies to individual developers too: use DeepSeek for drafts, Claude for review and fixes. Play to each model's strengths.

Tool Calling Reliability

Claude handles function calling well — parameter passing is almost always correct. Open models occasionally make basic mistakes: wrong parameter types, missing required fields, malformed JSON.

This gap is especially noticeable with MCP servers. I tested DeepSeek V4 with Hermes Agent and about 15% of tool calls failed, versus under 2% for Claude.

Effective Context Length

While every model claims 128K or even 1M context support, open models degrade noticeably beyond 32K tokens. Claude's 200K context stays relatively stable up to 150K.

If your projects involve large files (like a 5000-line config), open models may "forget" earlier content.

My Recommended Migration Strategy

Don't go cold turkey. I suggest a three-week gradual approach:

Week 1: Set up DeepSeek API in Cursor or Aider. Use it for simple coding tasks (writing functions, tests, formatting). Keep using Claude for complex work.

Week 2: Expand DeepSeek's scope. Try medium-complexity tasks (refactoring small modules, writing API endpoints). Switch to Claude when it can't handle something.

Week 3+: Find your personal "dividing line." Which tasks can DeepSeek handle, which need Claude? Lock that in as your workflow.

The key is not to feel like you're "downgrading." Using open models isn't a demotion — it's resource allocation. You don't use Photoshop to crop a screenshot.

Looking Ahead

Open models are iterating fast. DeepSeek V4 was a massive leap over V3, Qwen 3 was a generational jump from Qwen 2. At this pace, open models might cover 95% of coding scenarios within six months.

The "model collaboration" direction is especially interesting. Sakana Fugu proved that multiple mid-tier models collaborating can beat a single top-tier model. If this approach catches on, the question shifts from "which model" to "how to combine models."

Local deployment costs keep dropping too. Running a 70B model used to require an A100; now a 4090 with quantization works. When the 5090 arrives, running 100B+ models locally might not be a dream anymore.

I'm planning to write a detailed Aider + DeepSeek V4 setup tutorial next, covering MCP server configuration and prompt optimization. Drop a comment if you have questions.

One last thing: no matter what model you use, your own technical skills matter most. AI coding tools are just tools. If you can't do basic code review yourself, you'll write buggy code with Claude or DeepSeek alike.

Tools help you work faster. They don't write code for you. Get that mindset right and you'll do fine with any model.

advertisement

Claude Now Requires Identity Verification — Can Open Models Fill the Gap? A Week of Real Testing — AI Hub