Coding Agents: Devin, Claude Code, and Beyond

TL;DR

Autonomous coding agents (Devin, Claude Code, etc.) can execute multi-step tasks: read tickets, write code, run tests, iterate.
They're best at: well-defined tasks, greenfield code, and prototyping. They struggle with: ambiguity, legacy systems, and judgment calls.
Treat them as junior contractors. Give clear specs. Review everything.

As of 2026, "AI software engineers" that can take a ticket and produce a PR are real. They're not replacing senior devs. They're automating a slice of work that used to go to juniors or outsourcing.

What Agents Can Do

Implement features from specs — Given a clear ticket ("Add validation for email field on signup form"), they can write code, add tests, and sometimes open a PR.
Fix bugs — Especially when the repro is clear and the codebase is navigable.
Refactors — "Replace all usages of deprecated API X with Y" — mechanical changes at scale.
Spikes and prototypes — Throw together a PoC. You refine.
Boilerplate generation — New service skeleton, CRUD endpoints, basic tests.

They work by: reading context (ticket, codebase), planning steps, writing code, running commands, and iterating on errors.

What Agents Can't Do (Yet)

Ambiguous requirements — "Make the onboarding better" → they'll guess. Badly.
Cross-system coordination — "Update the API and the mobile app and the docs" across repos they can't see.
Architecture decisions — They'll implement an approach. You decide which approach.
Legacy spelunking — Old codebases with no tests, weird conventions, tribal knowledge — they get lost.
Security-critical code — Don't let an agent write auth, crypto, or payment logic without deep review.
"Figure out what we need" — They execute. They don't discover.

Claude Code, Devin, and the Landscape

Claude Code (Anthropic): Standalone app. Great for coding tasks in a sandbox. Strong reasoning. No direct IDE integration.

Devin (Cognition): Positioned as "AI software engineer." Can work in a cloud environment, run terminals, browse. Good for end-to-end tasks. Still evolving.

Others: OpenAI has coding agents. Google has Codey. The space is moving fast.

Pick based on: your stack, your workflow, and what your team has access to. The principles (clear specs, review, know limitations) apply to all.

How to Use Them Effectively

Write specs that a junior could follow. Clear acceptance criteria. Example inputs/outputs. No "figure it out."
Scope small. One ticket, one PR. Not "build the whole feature."
Provide context. Link to relevant docs, similar implementations, conventions.
Review like you're reviewing a contractor. Would you ship this? What's missing?
Iterate. Agent got it 70% right? Refine the ticket and run again. Or finish by hand.

When Not to Use an Agent

Tight deadline and high stakes — you need control.
Novel problem — you're still exploring; agent will converge too early.
Tiny change — faster to do it yourself than spec it for an agent.
Team hasn't adopted yet — don't surprise people with agent-generated PRs. Align first.

You're given 'Add validation for the signup form.' You implement it. PR gets rejected — they wanted client + server validation, specific error messages, and to match the login flow. Rework.

Click "Clear spec → agent executes" to see the difference →

Quick Check

When should you NOT use an autonomous coding agent like Claude Code or Devin?

Do This Next

Try one agent (Claude Code or Devin if you have access) on a small, well-defined task. Document what worked and what didn't.
Write a "agent-ready" ticket for a real backlog item. See how clear you have to be for a machine to execute it.