For the last month or so, I’ve been running a small experiment. I moved my main coding workflow off Claude Opus and onto DeepSeek’s v4 Pro, the latest release while it was discounted, and I used it for real work.
The test was simple: build an MVP, index a marketplace, run long agentic sessions, and see what breaks first, the model, my patience, or my wallet.
Spoiler: none of them broke. And the bill came in under $50.
Why I Tried DeepSeek in the First Place
I’d love to tell you this was a deliberate, well-reasoned migration, but it wasn’t.
When DeepSeek announced their latest model, the timing was useful: I was already frustrated with Claude. Opus deployments were unstable that week and quotas were burning down faster than usual, sessions were getting cut, and the 5-hour windows on the $200 Max plan kept hitting walls in the middle of long agentic runs. If you’ve been on the Max plan during a busy week, you know the feeling. You’re mid-flow, the agent is exploring a codebase, and suddenly you’re locked out for the next 90 minutes.
So I configured my claude CLI to point at DeepSeek’s API instead. Bought $10 in credits as a sanity check. Set the discounted endpoint, picked the right model name, and went back to work.
Honestly, I switched because Claude was failing me at that moment, not because I had carefully read the DeepSeek release notes. The discount made the experiment cheap enough that there was nothing to debate.
The Real Test: Building dronicons.com
The workload mattered, so let me describe it.
I’ve been building dronicons.com, an MVP that aggregates drone components from various retailers into a single searchable marketplace. Think of it as a price-comparison and parts-discovery tool for people building or repairing drones. It’s a real product with real users coming in, not a weekend toy.
The work splits into two very different shapes:
Application code: Nest.js frontend, API layer, search, filtering, deduplication logic, schema work, plus the usual bug fixing and refactoring you accumulate in an MVP that’s evolving weekly.
Indexing pipelines: scrapers and parsers that pull product listings from multiple sources, normalize them into a unified schema, classify components (motors, ESCs, frames, flight controllers), and enrich them with metadata. This is messy, edge-case-heavy work where every retailer formats its data differently.
That second workload is where LLMs really earn their keep. Every retailer is its own snowflake. Specs are inconsistent. Categories overlap. The model needs to make judgment calls thousands of times per import run.
So when I say I exhausted the DeepSeek API, I mean it. I was hitting it from two angles: long agentic coding sessions during development, and high-volume classification calls during indexing. That’s a much harder test than just “vibe coding a CRUD app for an afternoon.”
Where DeepSeek Felt Stronger: Long-Running Work
The first few hours, I didn’t feel a difference. Output quality looked roughly comparable to Opus. Code style was fine. Reasoning seemed sound.
But after a couple of days, a pattern emerged.
DeepSeek lasted longer on a single thought. With Opus, my typical loop looked like this: send a prompt, watch the agent work for 10 to 15 minutes, get a partial result, then spend the next round reprompting, clarifying, or correcting course. It’s iterative by necessity. The model would tap out before the task was actually done, or it would commit to a direction early and need me to nudge it.
With DeepSeek, I’d send a similar prompt and the session would run for 30 to 40 minutes before coming back with something I could actually evaluate. Same kind of task, same codebase, but the model kept pulling on threads instead of declaring victory early.
That changed how I worked. I stopped sitting next to the agent. I’d write a detailed prompt, often using dictation because I could think out loud, then walk away, make coffee. When I came back I just read what it did.
This matters because the cost of context switching for the human is the hidden tax of agentic coding. Every reprompt is 30 seconds of “where was I again?” Every clarification is a small drain on focus. Cutting the reprompt frequency in half had a bigger effect on my output than any specific code quality difference.
I’ll caveat this honestly: Opus is no slouch, and I haven’t spent enough time with the very latest Opus 4.7 release to compare apples to apples. My comparison is against the version I’d been running for months. But for my workflow, on my codebase, DeepSeek covered more ground per prompt.
Sub-Agents Went Deeper Than Expected
The bigger surprise was in the sub-agents.
Most modern agentic setups spawn sub-agents for parallel exploration: one investigates the database layer, another reads the frontend, another reviews tests. Each sub-agent has its own context budget, and when it terminates, it returns a summary to the orchestrator.
With Opus, my sub-agents typically returned after burning 20-30k tokens. That’s a reasonable exploration: a handful of files, some focused reasoning, a summary. Useful, but you can feel the ceiling.
With DeepSeek, the same sub-agents were running through 80-90k tokens before terminating. That’s three to four times the depth. They were reading more files, cross-referencing more thoroughly, and surfacing edge cases I wouldn’t have thought to ask about.
For planning work, this is the difference between a good plan and a complete one. When you ask an agent to design an indexing pipeline that handles five retailers, a shallow exploration gives you the happy path. A deep exploration gives you the happy path plus the seventeen ways each retailer breaks your assumptions: missing fields, inconsistent units, broken pagination, retailer-specific anti-scraping behavior, mixed currencies, duplicate SKUs across stores.
I didn’t have to prompt for edge cases. The agent found them itself because it had the budget to actually look.
That said, deeper exploration also means longer waits and more tokens consumed. It’s not free. But on this kind of work, the trade was clearly worth it.
The Cost Reality: Heavy Usage Under $50
Now the part everyone wants to know.
Across more than a month of heavy daily usage, the numbers tell the story. On the Max plan, Opus cost me $200 per month as a subscription. DeepSeek API usage came in at $40 to $50 total in credits over the same period. Typical run durations went from 10-15 minutes with Opus to 30-40 minutes with DeepSeek. Sub-agent token depth jumped from 20-30k to 80-90k. And where Opus quota exhaustion was frequent, I never hit a wall on DeepSeek.
My pattern was simple: top up $10, use it for roughly a week, top up again when it ran low. Over the full month I never crossed $50 in spend. That includes coding sessions, agentic runs, and the indexing workload for the marketplace, which on its own would have been the bulk of any LLM bill on most providers.
DeepSeek’s off-peak discount helps. Even so, if the price doubled tomorrow when the discount ended, this would still be cheaper than Opus for my workload. And I’d still get the part that money can’t always buy on subscription plans: no session limits, no hourly quotas, no “you’ve used 80% of your window” warnings during the deep-work hours.
There’s a broader market signal here too. As noted in 3 market trends that could shape the rest of 2026, open-weight and Chinese model providers are taking real share against the incumbents, and pricing pressure is part of why. My one-engineer experiment is a data point inside that pattern, not an exception to it.
The Tradeoffs: Still Not One-Shot Magic
Let me be clear about what DeepSeek is not.
It is not a one-shot oracle. Neither is Opus. In a month of heavy use, I never got a single prompt to produce final, shippable code without iteration. Always there were bugs, always there were misreadings of the codebase, always there was a follow-up round.
DeepSeek also feels more conservative in some places. Frontend polish, microcopy, and tasteful UI choices still feel like an Opus strength to me, though that may be a “what I’m used to” effect rather than a real gap.
And the longer runs cut both ways. When the model is right, you save reprompts. When it’s wrong, you’ve now wasted 40 minutes instead of 15 going down the wrong path. Long-running agents amplify both signal and noise, so a clear, well-scoped prompt matters more, not less.
Why I’m Not Rushing Back to Opus
I know Opus has shipped meaningful changes recently and I’ll experiment with them eventually.
But I’m not paying $200 again for a subscription that punishes the way I actually work: long sessions, deep exploration, mixed coding and data workloads, weekends where I want to think out loud and let the model run.
DeepSeek isn’t beating Opus on every axis. It’s beating it on the axis I happen to care about right now, at a fraction of the price, with no quota anxiety. That’s a serious competitor. And the next version will only narrow the remaining gaps.
If you’ve been hitting the same Opus walls I was, spend $10 and run your own experiment. That’s all this is.
A Month with DeepSeek: What Happened When I Replaced Claude Opus for Real Work
The Month I Swapped Claude for DeepSeek
For the last month or so, I’ve been running a small experiment. I moved my main coding workflow off Claude Opus and onto DeepSeek’s v4 Pro, the latest release while it was discounted, and I used it for real work.
The test was simple: build an MVP, index a marketplace, run long agentic sessions, and see what breaks first, the model, my patience, or my wallet.
Why I Tried DeepSeek in the First Place
I’d love to tell you this was a deliberate, well-reasoned migration, but it wasn’t.
When DeepSeek announced their latest model, the timing was useful: I was already frustrated with Claude. Opus deployments were unstable that week and quotas were burning down faster than usual, sessions were getting cut, and the 5-hour windows on the $200 Max plan kept hitting walls in the middle of long agentic runs. If you’ve been on the Max plan during a busy week, you know the feeling. You’re mid-flow, the agent is exploring a codebase, and suddenly you’re locked out for the next 90 minutes.
So I configured my claude CLI to point at DeepSeek’s API instead. Bought $10 in credits as a sanity check. Set the discounted endpoint, picked the right model name, and went back to work.
Honestly, I switched because Claude was failing me at that moment, not because I had carefully read the DeepSeek release notes. The discount made the experiment cheap enough that there was nothing to debate.
The Real Test: Building dronicons.com
The workload mattered, so let me describe it.
I’ve been building dronicons.com, an MVP that aggregates drone components from various retailers into a single searchable marketplace. Think of it as a price-comparison and parts-discovery tool for people building or repairing drones. It’s a real product with real users coming in, not a weekend toy.
The work splits into two very different shapes:
That second workload is where LLMs really earn their keep. Every retailer is its own snowflake. Specs are inconsistent. Categories overlap. The model needs to make judgment calls thousands of times per import run.
So when I say I exhausted the DeepSeek API, I mean it. I was hitting it from two angles: long agentic coding sessions during development, and high-volume classification calls during indexing. That’s a much harder test than just “vibe coding a CRUD app for an afternoon.”
Where DeepSeek Felt Stronger: Long-Running Work
The first few hours, I didn’t feel a difference. Output quality looked roughly comparable to Opus. Code style was fine. Reasoning seemed sound.
But after a couple of days, a pattern emerged.
DeepSeek lasted longer on a single thought. With Opus, my typical loop looked like this: send a prompt, watch the agent work for 10 to 15 minutes, get a partial result, then spend the next round reprompting, clarifying, or correcting course. It’s iterative by necessity. The model would tap out before the task was actually done, or it would commit to a direction early and need me to nudge it.
With DeepSeek, I’d send a similar prompt and the session would run for 30 to 40 minutes before coming back with something I could actually evaluate. Same kind of task, same codebase, but the model kept pulling on threads instead of declaring victory early.
That changed how I worked. I stopped sitting next to the agent. I’d write a detailed prompt, often using dictation because I could think out loud, then walk away, make coffee. When I came back I just read what it did.
This matters because the cost of context switching for the human is the hidden tax of agentic coding. Every reprompt is 30 seconds of “where was I again?” Every clarification is a small drain on focus. Cutting the reprompt frequency in half had a bigger effect on my output than any specific code quality difference.
I’ll caveat this honestly: Opus is no slouch, and I haven’t spent enough time with the very latest Opus 4.7 release to compare apples to apples. My comparison is against the version I’d been running for months. But for my workflow, on my codebase, DeepSeek covered more ground per prompt.
Sub-Agents Went Deeper Than Expected
The bigger surprise was in the sub-agents.
Most modern agentic setups spawn sub-agents for parallel exploration: one investigates the database layer, another reads the frontend, another reviews tests. Each sub-agent has its own context budget, and when it terminates, it returns a summary to the orchestrator.
With Opus, my sub-agents typically returned after burning 20-30k tokens. That’s a reasonable exploration: a handful of files, some focused reasoning, a summary. Useful, but you can feel the ceiling.
With DeepSeek, the same sub-agents were running through 80-90k tokens before terminating. That’s three to four times the depth. They were reading more files, cross-referencing more thoroughly, and surfacing edge cases I wouldn’t have thought to ask about.
For planning work, this is the difference between a good plan and a complete one. When you ask an agent to design an indexing pipeline that handles five retailers, a shallow exploration gives you the happy path. A deep exploration gives you the happy path plus the seventeen ways each retailer breaks your assumptions: missing fields, inconsistent units, broken pagination, retailer-specific anti-scraping behavior, mixed currencies, duplicate SKUs across stores.
I didn’t have to prompt for edge cases. The agent found them itself because it had the budget to actually look.
That said, deeper exploration also means longer waits and more tokens consumed. It’s not free. But on this kind of work, the trade was clearly worth it.
The Cost Reality: Heavy Usage Under $50
Now the part everyone wants to know.
Across more than a month of heavy daily usage, the numbers tell the story. On the Max plan, Opus cost me $200 per month as a subscription. DeepSeek API usage came in at $40 to $50 total in credits over the same period. Typical run durations went from 10-15 minutes with Opus to 30-40 minutes with DeepSeek. Sub-agent token depth jumped from 20-30k to 80-90k. And where Opus quota exhaustion was frequent, I never hit a wall on DeepSeek.
My pattern was simple: top up $10, use it for roughly a week, top up again when it ran low. Over the full month I never crossed $50 in spend. That includes coding sessions, agentic runs, and the indexing workload for the marketplace, which on its own would have been the bulk of any LLM bill on most providers.
DeepSeek’s off-peak discount helps. Even so, if the price doubled tomorrow when the discount ended, this would still be cheaper than Opus for my workload. And I’d still get the part that money can’t always buy on subscription plans: no session limits, no hourly quotas, no “you’ve used 80% of your window” warnings during the deep-work hours.
There’s a broader market signal here too. As noted in 3 market trends that could shape the rest of 2026, open-weight and Chinese model providers are taking real share against the incumbents, and pricing pressure is part of why. My one-engineer experiment is a data point inside that pattern, not an exception to it.
The Tradeoffs: Still Not One-Shot Magic
Let me be clear about what DeepSeek is not.
It is not a one-shot oracle. Neither is Opus. In a month of heavy use, I never got a single prompt to produce final, shippable code without iteration. Always there were bugs, always there were misreadings of the codebase, always there was a follow-up round.
DeepSeek also feels more conservative in some places. Frontend polish, microcopy, and tasteful UI choices still feel like an Opus strength to me, though that may be a “what I’m used to” effect rather than a real gap.
And the longer runs cut both ways. When the model is right, you save reprompts. When it’s wrong, you’ve now wasted 40 minutes instead of 15 going down the wrong path. Long-running agents amplify both signal and noise, so a clear, well-scoped prompt matters more, not less.
Why I’m Not Rushing Back to Opus
I know Opus has shipped meaningful changes recently and I’ll experiment with them eventually.
But I’m not paying $200 again for a subscription that punishes the way I actually work: long sessions, deep exploration, mixed coding and data workloads, weekends where I want to think out loud and let the model run.
DeepSeek isn’t beating Opus on every axis. It’s beating it on the axis I happen to care about right now, at a fraction of the price, with no quota anxiety. That’s a serious competitor. And the next version will only narrow the remaining gaps.
If you’ve been hitting the same Opus walls I was, spend $10 and run your own experiment. That’s all this is.
Further Reading
Archives
Categories
Archives
Recent Post
A Month with DeepSeek: What Happened When I Replaced Claude Opus for Real Work
May 23, 2026VSM2AI: A Process-First Framework for AI Adoption That Actually Works
April 9, 2026I Deployed Gemma 4 32B on a Rented H100 for $1.50/Hour. The Hard Part Wasn’t What I Expected.
April 5, 2026Categories
Meta
Calendar