Coding Agents: Devin, Claude Code, and Beyond
Eng Manager
Agents can own small, well-scoped tickets. They can't own ambiguous features or cross-team work.
Tech Lead
Use agents for spikes and prototypes. Reserve human review for anything that ships to prod.
Coding Agents: Devin, Claude Code, and Beyond
TL;DR
- Autonomous coding agents (Devin, Claude Code, etc.) can execute multi-step tasks: read tickets, write code, run tests, iterate.
- They're best at: well-defined tasks, greenfield code, and prototyping. They struggle with: ambiguity, legacy systems, and judgment calls.
- Treat them as junior contractors. Give clear specs. Review everything.
As of 2026, "AI software engineers" that can take a ticket and produce a PR are real. They're not replacing senior devs. They're automating a slice of work that used to go to juniors or outsourcing.
What Agents Can Do
- Implement features from specs — Given a clear ticket ("Add validation for email field on signup form"), they can write code, add tests, and sometimes open a PR.
- Fix bugs — Especially when the repro is clear and the codebase is navigable.
- Refactors — "Replace all usages of deprecated API X with Y" — mechanical changes at scale.
- Spikes and prototypes — Throw together a PoC. You refine.
- Boilerplate generation — New service skeleton, CRUD endpoints, basic tests.
They work by: reading context (ticket, codebase), planning steps, writing code, running commands, and iterating on errors.
What Agents Can't Do (Yet)
- Ambiguous requirements — "Make the onboarding better" → they'll guess. Badly.
- Cross-system coordination — "Update the API and the mobile app and the docs" across repos they can't see.
- Architecture decisions — They'll implement an approach. You decide which approach.
- Legacy spelunking — Old codebases with no tests, weird conventions, tribal knowledge — they get lost.
- Security-critical code — Don't let an agent write auth, crypto, or payment logic without deep review.
- "Figure out what we need" — They execute. They don't discover.
Claude Code, Devin, and the Landscape
Claude Code (Anthropic): Standalone app. Great for coding tasks in a sandbox. Strong reasoning. No direct IDE integration.
Devin (Cognition): Positioned as "AI software engineer." Can work in a cloud environment, run terminals, browse. Good for end-to-end tasks. Still evolving.
Others: OpenAI has coding agents. Google has Codey. The space is moving fast.
Pick based on: your stack, your workflow, and what your team has access to. The principles (clear specs, review, know limitations) apply to all.
How to Use Them Effectively
- Write specs that a junior could follow. Clear acceptance criteria. Example inputs/outputs. No "figure it out."
- Scope small. One ticket, one PR. Not "build the whole feature."
- Provide context. Link to relevant docs, similar implementations, conventions.
- Review like you're reviewing a contractor. Would you ship this? What's missing?
- Iterate. Agent got it 70% right? Refine the ticket and run again. Or finish by hand.
When Not to Use an Agent
- Tight deadline and high stakes — you need control.
- Novel problem — you're still exploring; agent will converge too early.
- Tiny change — faster to do it yourself than spec it for an agent.
- Team hasn't adopted yet — don't surprise people with agent-generated PRs. Align first.
You're given 'Add validation for the signup form.' You implement it. PR gets rejected — they wanted client + server validation, specific error messages, and to match the login flow. Rework.
Click "Clear spec → agent executes" to see the difference →
Quick Check
When should you NOT use an autonomous coding agent like Claude Code or Devin?
Do This Next
- Try one agent (Claude Code or Devin if you have access) on a small, well-defined task. Document what worked and what didn't.
- Write a "agent-ready" ticket for a real backlog item. See how clear you have to be for a machine to execute it.