Trying Claude Code as a developer feels like spinning a slot machine

For the longest time, I had been proudly claiming to be a "purist" programmer and looked over the shoulder to those that extensively use AI coding tools on a day-to-day basis. Not because I wanted to be smug and pedantic, which sometimes is the case, but mostly because I have seen many of my colleagues jump on the AI-assisted coding hype train and gradually lose their ability to come up with solutions to complex problems or debug incidents without them. I would still use AI every now and then: for coding assistance when I have to write a README file or some RegEx, for proofreading because English is not my first language, or even for illustration.

In any case, there was I, minding my own business and writing my code with my own hands (like a dinosaur, I know) when, for the strangest reason, I was asked to get familiarised with Claude Code. Me, the complete opposite of the poster child for the AI broligarchs? Sure, I had no say in it anyways.

So I got myself some API keys and installed Claude Code. I was ready to jump into the future of tech and leave programming behind - because programming is largely solved, as long as Claude is not down. In this exercise, I set to be a mere expectator with just a product idea, and see how far this would get me in a weekend.

The Project: A Full-Stack App in Days
The task was straightforward: build a full-stack application using FastAPI (backend) and React Native (Expo frontend). The goal was to demonstrate Claude Code’s ability to generate functional code quickly. I spent a couple of days working with the tool, breaking down the project into manageable parts.

Backend (FastAPI): I asked Claude to generate a basic API with CRUD operations. It produced code that worked, but with minimal validation and scalability considerations.
Frontend (React Native): The tool created three screens for the app, which looked decent on the surface. However, the behaviour was fragile—navigation wasn’t smooth, and interactions often broke unexpectedly. Interestingly, even Opus struggles to program a roulette animation.

By the end of the project, I had a half-baked application that sort of worked, but the underlying architecture was riddled with issues. The result? A prototype that felt like a “just get it to compile” solution rather than a polished product, or one that followed the specs even.

One of the most frustrating aspects was the cost. I spent $50 in tokens on this project, which felt like a significant investment for a tool that delivered inconsistent results. While I’m not against paying for tools that save time (if your company is giving you free tokens, might as well use them), the ROI here was questionable.

Prompt Engineering: I tried refining prompts, breaking tasks into smaller steps, and saving context in the file system to improve outcomes. But even with these efforts, the results were hit-or-miss. For instance, I used Context7 MCP to use relevant documentation for libraries and I still faced version compatibility issues.
Bypassing Permissions: To push the tool further, I had to bypass certain restrictions. This opened the door to more complex tasks but also introduced risks—like generating code that could have security vulnerabilities.
Agent Orchestration: in the beginning, using subagents to handle frontend, backend, architectural and testing tasks seemed to work but as complexity grew, within the same day I had to switch into using subagents per repo. I tried the collaboration orchestration approach where agents can talk to one another while working on a task - its premise was very appealing, but soon started to hit a wall as Claude opened several tmux instances and ran into concurrency issues.

Although I do admit that Claude Code is a beautiful piece of AI Engineering work itself, the unpredictability of its output made me feel like I was playing a slot machine. Too much of a gamble, and I always found myself tricked into sending one more prompt, one more try.

The tool excels at generating code for simple tasks but struggles with complex logic. For example, implementing a feature that requires multiple interdependent components often led to errors or incomplete code.

Even when the code worked initially, it often broke under testing (and I am not referring to the vast amount of redundant unit tests created by the model). The lack of robustness made debugging a nightmare, but most importantly, since I had no direct control (besides my prompts) on the code itself, I had very little understanding of where the model made the wrong assumptions and therefore wrote brittle code.

The scaling challenges were also quite evident: as the project grew in complexity, the tool’s limitations became apparent. It couldn’t handle nuanced requirements or maintain consistency across modules.

All in all, my experience with Claude Code was a reminder that AI tools are still in their infancy. While they can be powerful, they’re not perfect—and sometimes, they feel like a gamble. For now, I’ll continue using them sparingly, treating them as a tool rather than a solution. I myself use AI as a rubber ducky or a second pair of eyes with which I can discuss different approaches to, say, write a piece of code, or architect a solution. And for now, I'll stay away from embedding AI into my IDE or terminal.

If you’re considering AI for your next project, approach it with caution and be ready to do the heavy lifting yourself.

If you’re a developer, ask yourself: Is this AI tool solving a problem, or just adding another layer of complexity? In my case, the answer was clear.