$catMANUAL||~39 min

Claude Code Doesn't Write Code: The Surprising Case of Using an AI Coding Tool to Analyze an MRI

advertisement

Claude Code Doesn't Write Code: The Surprising Case of Using an AI Coding Tool to Analyze an MRI

I was scrolling through Hacker News this morning and stumbled on a post with 286 points and 391 comments. The title: "I used Claude Code to get a second opinion on my MRI."

My first reaction was: seriously? A coding tool reading medical images?

But after reading through the whole thing — and especially the comment section — I think this might be one of the best demonstrations of what AI Agents actually are. The HN discussion turned into a full-blown debate with radiologists, ultrasound techs, lawyers, and patients all chiming in. It was way more interesting than the original post.

So let's dig into this: what happened, why it worked (sort of), and what it tells us about AI coding tools being used for things that have nothing to do with coding.

What Actually Happened

The poster, Antoine (French guy), had shoulder pain for a couple weeks. Went to an orthopedist, got an MRI. The diagnosis: Grade III partial-thickness tear of the subscapularis tendon — basically, more than 50% of the tendon was torn.

The clinic immediately started treatment. Shockwave therapy, Traumeel injections, the whole works. Something felt off to Antoine. He asked for copies of the MRI data and the treatment records before leaving.

Back home, he fed the treatment plan to GPT 5.5 Pro first. The AI flagged two things right away:

  • Shockwave therapy isn't recommended for non-calcified rotator cuff tendinopathy — and the ultrasound report explicitly said there was no calcification
  • Traumeel is registered in Germany as a homeopathic medicine "without a therapeutic indication"

Not exactly confidence-inspiring. So Antoine did something most people would never think of: he dumped the 266MB DICOM image files into Claude Code and asked it to do an independent MRI analysis.

Why Claude Code and Not Just Chat?

Here's the thing that most people miss.

You might wonder: why not just use Claude.ai's regular chat interface? Same model, right?

As Antoine put it: "The difference between Claude Code and Claude.ai's chat is enormous, even if those two run the same model."

The difference is hands.

A chat AI can only talk. It doesn't have a terminal, can't install Python packages, can't read and write files, can't run scripts. You give it an image, it can look at it. But you give it 266MB of hundreds of extensionless DICOM files? It's stuck.

Claude Code has a full terminal environment. It can:

bash
1
pip install pydicom numpy matplotlib

Then write Python scripts to read DICOM files, analyze images frame by frame, generate visualizations, and output a PDF report.

This is the fundamental difference between an AI Agent and AI chat: the Agent has hands. It can do things.

I've written a bunch of articles about Claude Code before, mostly covering coding scenarios. But this case made me realize its value goes way beyond writing code. Any task that involves "install tools → read data → process → output results" — it can handle.

The Analysis: What Claude Code Actually Did

Antoine's instruction was dead simple: "right shoulder pain for 2-3 weeks." That's it. Nothing else.

Claude Code took the DICOM files, spent about an hour, and produced a 7.72MB PDF report. The structure was solid — imaging analysis, findings, conclusions.

But here's the kicker: Claude Code's conclusion completely contradicted the human doctor.

Doctor: subscapularis tendon has a greater-than-50% tear. Claude Code: tendon is intact, no tear.

Antoine called this "quite disconcerting." He expected a lower grade, not an outright denial of the tear's existence.

The Arbitration: Letting AI Judge Itself

Antoine didn't stop there. He did something clever: he asked Claude Code to arbitrate between the two reports.

He fed both the human report and Claude Code's report back to Opus, this time also including a conversation he'd had with ChatGPT 5.5 Pro about which movements triggered pain.

Claude Code's approach was smart. It spun up multiple subagents, each analyzing independently to avoid context bias. About another hour later, the arbitration report came back.

The verdict: support Claude Code's initial assessment (moderate-to-high confidence). Mild insertional tendinosis, but no discrete partial or full-thickness tear.

The AI didn't just stick to its guns — it re-verified its own conclusion using a more rigorous methodology.

The HN Comments: Where It Gets Really Interesting

The original post was good. The comment section was better.

The Radiologist Weighs In

Top comment, from an actual radiologist. Key points:

  • Can't really judge without seeing the full 3D dataset — fair enough
  • Ultrasound isn't great for assessing calcification — X-ray is better
  • Shockwave therapy without calcification isn't harmful, just not helpful

But here's an interesting nuance: when a radiology report says "no finding present," there's an implicit caveat — "within the scope of this imaging modality." Ultrasound says no calcification, X-ray might say there is one. They're not contradicting each other. But to a regular person, this is incredibly confusing.

The Ultrasound Tech's Take

An ultrasound tech was more blunt:

"I like some uses of AI that help patients advocate for themselves, but it's really bad at glazing people and leading them down medical rabbit holes."

That nails a core problem: AI answers look professional, but they might miss the "obvious" things that domain experts take for granted.

The Patient Family Member

One comment hit hard. Someone's father had complications from a motorcycle accident, ended up needing amputations. Through the whole process, 15 different doctors and PAs couldn't reach consensus. Each one just said "you decide."

"What exactly are we supposed to do as patients when medical personnel cannot give reasonable paths forward? When we're dealing with life-changing circumstances, what else can we do except rely on AI?"

That comment got 30+ upvotes. It reveals something important: the problem isn't that AI is so great, it's that the real-world medical system can be so frustrating.

The "AI Psychosis" Worry

Not everyone was on board. One commenter said:

"Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful."

Someone else compared trusting AI to religious faith:

"I was an early AI adopter, shared it with many people. The folks who used it came back to share their experiences, and I'm listening like someone in a church congregation sharing their experiences with god. Clear and obvious gaps are hand-waved away."

These discussions are honest. It's not "AI is amazing" or "AI is garbage" — it's all gray areas colliding.

The Technical Architecture: Why Claude Code Can Do This

Let me zoom into the technical side for a moment.

Claude Code runs on an Agent Loop:

  1. Receive task (user types "analyze this MRI")
  2. Plan steps (model breaks the task into subtasks)
  3. Execute tool calls (each step calls the appropriate tool)
  4. Observe results (feed tool output back to the model)
  5. Adjust strategy (decide next step based on results)
  6. Loop until done

Steps 3 and 4 are the key. Chat models don't have them — they generate a response in one shot, never "seeing" intermediate results.

For the MRI analysis, Claude Code's actual execution probably looked something like this:

bash
1
# Step 1: Explore the file structure
2
ls -la mri_data/ | head -20
3
file mri_data/0001
4
 
5
# Step 2: Install dependencies
6
pip install pydicom numpy matplotlib Pillow
7
 
8
# Step 3: Write and run analysis script
9
python3 analyze_mri.py
10
 
11
# Step 4: Adjust analysis strategy based on output

After each step, Claude Code checks the output and decides what to do next. If the files aren't standard DICOM format, it switches approach. If the script errors out, it fixes the bug and reruns.

That's the fundamental difference between Agent and chat. Chat is one-shot. Agent is iterative.

Claude Code also has subagent capability. During the arbitration, Opus launched multiple independent subagents, each with its own context window. This is like having three people independently review the same material and compare conclusions. If all three agree, confidence goes up significantly.

This architecture is what makes complex analysis possible. Pure chat models can't do it.

More "Non-Coding" Uses: Things I've Seen in the Wild

Beyond MRI analysis, I've come across plenty of cases where people use AI coding tools for non-coding tasks.

Data Processing

Someone used Claude Code to process a pile of messy Excel spreadsheets from different departments, each with its own format. Manual cleanup would have taken days. Claude Code wrote a script that auto-detected each spreadsheet's structure, normalized the format, and merged everything.

python
1
import pandas as pd
2
import glob
3
 
4
all_data = []
5
for f in glob.glob("reports/*.xlsx"):
6
    df = pd.read_excel(f, header=None)
7
    header_row = df.apply(lambda x: x.astype(str).str.contains("date|amount|total").any(), axis=1).idxmax()
8
    df = pd.read_excel(f, header=header_row)
9
    all_data.append(df)
10
 
11
result = pd.concat(all_data, ignore_index=True)
12
result.to_csv("merged_output.csv", index=False)

The key is the "auto-detect header row" step — different spreadsheets have headers in different rows, which is painful to specify manually but trivial in code.

Legal Contract Comparison

A lawyer friend told me he uses Claude Code to compare two contracts. Feed two PDFs in, extract clauses, diff them, highlight discrepancies. Used to be junior associate work. Now AI does it in minutes.

Of course, a lawyer still needs to review. AI might miss semantic nuances or produce false positives. But as a first-pass tool, the efficiency gain is real.

Batch Academic Paper Processing

Researchers use Claude Code to batch-read PDFs, extract abstracts, methodologies, and results, then generate literature review drafts. This is perfect for Agent work because the paper count is high, the information to extract is structured, and the output format needs consistency.

Server Ops

This one's from my own experience. Had to check whether 20 servers had consistent configs. Used to mean SSHing into each one manually. Had Claude Code write a script, ran it once, done. Five minutes instead of half a day.

The Gotchas: Don't Trust the Agent Too Much

Fair enough, let me talk about the downsides.

Hallucinations Haven't Gone Away

Claude Code is still an LLM. It might "see" patterns that don't exist in your data, write buggy logic into its scripts, and then confidently tell you "analysis complete."

I once had it analyze a CSV for trends. It produced a professional-looking chart with an accompanying analysis. Turned out it parsed the date column wrong. The trendline was completely made up.

Lesson: Agent output always needs verification, especially with data.

Context Window Limits

Claude Code's context window is bigger than chat (200K tokens), but it still hits limits with large files. 266MB of MRI data can't all fit in context — the Agent has to process in batches. That means it might miss information from certain areas.

It Doesn't Know What It Doesn't Know

This is the most dangerous LLM trait. It won't say "I'm not sure" — it'll give a confidently wrong answer.

In medical contexts, this is especially dangerous. A radiologist seeing something uncertain will say "recommend further imaging." AI might just render a judgment outright.

As that radiologist on HN said: "Medical imaging is one of those things everyone thinks is simple because they don't know what they don't know."

Consistency Problems

Someone on HN tested this: asking Claude the same medical question across different sessions produced contradictory answers. And it's easily led — suggest a direction and it'll drift that way.

This isn't a big deal in coding (code either works or it doesn't), but it's dangerous when professional judgment is needed.

AI Coding Tools vs. Dedicated Medical AI

You might ask: if you want to analyze MRIs, why not use a specialized medical AI tool?

Dedicated medical AI (like Google's Med-PaLM, FDA-approved imaging analysis software): trained on specific imaging types, clinically validated, regulated, but limited to specific tasks.

Claude Code and similar general-purpose Agents: not trained on medical imaging, no clinical validation, no regulatory approval, but extremely flexible — can handle any format, any task.

The interesting thing about Antoine's case isn't that Claude Code is more accurate than specialized tools — it probably isn't. It's that a general-purpose AI coding tool can produce a reasonable, reference-worthy second opinion in a highly specialized domain like medical imaging.

For regular people, this means: when you get a medical report you can't understand, beyond Google searches and waiting weeks for an appointment, you now have another option — a free, instant, unbiased "junior analyst."

Sure, this analyst might be wrong. But as that patient family member said: when 15 doctors can't reach consensus, one more perspective is better than none.

What This Means for Developers

A few takeaways for us:

1. AI Agents Are AIs With Hands

Don't get boxed in by the "coding tool" label. Claude Code, Codex CLI, Gemini CLI — their core capability is reading/writing files + executing commands + generating code. That combo can do a lot of things beyond writing software.

2. "Second Opinion" Is the Sweet Spot

Whether it's MRI scans, code review, or data analysis, AI's biggest value isn't replacing professionals — it's providing an independent, unbiased, infinitely patient second opinion.

3. Always Verify

Antoine's approach is worth emulating: he didn't just trust the AI, he ran an arbitration. And even after that, he decided to get another human doctor's take. AI is a tool, not an authority.

4. Coding Ability = General Capability

Learning to code isn't just about building software. Coding is the bridge between AI and the physical world. Claude Code can analyze MRIs because it can write Python to read DICOM files. Hook it up to other tools (APIs, databases, hardware interfaces), and the possibilities multiply.

Wrapping Up

This case made me rethink what "AI coding tool" really means.

We habitually classify Claude Code, Codex CLI, and the like as "programmer tools." But their core abilities — file I/O, code execution, data processing — are general-purpose. Coding is just the most common use case.

A hammer isn't just for nails. It cracks walnuts, breaks ice, and holds down papers. A tool's value depends on the user's imagination.

I'm not encouraging anyone to use AI for medical diagnosis. Professional matters should stay with professionals. But if you're a developer with Claude Code at your fingertips, think about this: beyond writing code, what tedious tasks could you hand off to it?

You might be pleasantly surprised.

I'm planning to try a few more "non-coding" scenarios myself. Will write about them when I do. Got questions? Drop them in the comments.

If you want to read Antoine's original post and the HN discussion (391 comments, lots of gems): https://antoine.fi/mri-analysis-using-claude-code-opus

advertisement

Claude Code Doesn't Write Code: The Surprising Case of Using an AI Coding Tool to Analyze an MRI — AI Hub