Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been forcing myself to "pure vibe-code" on a few projects, where I don't read a single line of code (even the diffs in codex/claude code).

Candidly, it's awful. There are countless situations where it would be faster for me to edit the file directly (CSS, I'm looking at you!).

With that said, I've been surprised at how far the coding agents are able to go[0], and a lot less surprised about where I need to step in.

Things that seem to help: 1. Always create a plan/debug markdown file 2. Prompt the agent to ask questions/present multiple solutions 3. Use git more than normal (squash ugly commits on merge)

Planning is key to avoid half-brained solutions, but having "specs" for debug is almost more important. The LLM will happily dive down a path of editing as few files as possible to fix the bug/error/etc. This, unchecked, can often lead to very messy code.

Prompting the agent to ask questions/present multiple solutions allows me to stay "in control" over the how something is built.

I now basically commit every time a plan or debug step is complete. I've tried having the LLM control git, but I feel that it eats into the context a bit too much. Ideally a 3rd party "agent" would handle this.

The last thing I'll mention is that Claude Code (Sonnet 4.5) is still very token-happy, in that it eagerly goes above and beyond when not always necessary. Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault. For both cases, this is where planning up-front is super useful.

[0]Caveat: the projects are either Typescript web apps or Rust utilities, can't speak to performance on other languages/domains.





Sonnet 4.5 is rebranded Opus 4. That's where it got its token-happiness.

Try asking Opus to generate a simple application and it'll do it. It'll also add thousands of lines of setup scripts and migration systems and Dockerfiles and reports about how it built everything and... Ooof.

Sonnet 4.5 is the same, but at a slightly smaller scale. It still LOVES to generate markdown reports of features it did. No clue why, but by default it's on, you need to specifically tell it to stop doing that.


Also, put heavy lint rules in place, and commit hooks to make sure everything compiles, lints, passes tests, etc. You've got to be super, super defensive. But Claude Code will see all those barriers and respond to them automatically which saves you the trouble of being vigilant over so many little things. You just need to watch the big picture, like make sure tests are there to replicate bugs, new features are tested, etc, etc.

Same as when coding with humans, better tests and linters will give you a shorter and simpler iteration loop.

LLMs love that.


> The last thing I'll mention is that Claude Code (Sonnet 4.5) is still very token-happy, in that it eagerly goes above and beyond when not always necessary. Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault.

I very much share your experience. As for the time being I like the experience with codex over claude, just because I find my self in a position where I know much sooner when to step in and just doing it manually.

With claude I find my self in a typing exercise much more often, I could probably get better of knowing when to stop ofc.


>> Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault.

I've seriously tried gpt-5-codex at least two dozen times since it came out, and every single time it was either insufficient or made huge mistakes. Even with the "have another agent write the specs and then give it to codex to implement" approach, it's just not very good. It also stops after trying one thing and then says "I've tried X, tests still failing, next I will try Y" and it's just super annoying. Claude is really good at iterating until it solves the issue.


What type of codebase are you working within?

I've spent quite a bit of time with the normal GPT-5 in Codex (med and high reasoning), so my perspective might be skewed!

Oh, one other tip: Codex by default seems to read partial files (~200 lines at a time), so I make sure to add "Always read files in full" to my AGENTS.md file.


> Candidly, it's awful.

Noting your caveat but I’m doing this with Python and your experience is very different from mine.


Oh, don't get me wrong, the models are marvelous!

The "it's awful" admission is due to the "don't look at code" aspect of this exercise.

For real work, my split is more like 80% LLM/20% non-LLM, and I read all the code. It's much faster!


    Always create a plan/debug markdown file
Very much necessary. Especially with Claude I find. It auto-compacts so often (Sonnet 4.5) and it instantly goes a-wall stupid after that. I then make it re-read the markdown file, so we can actually continue without it forgetting about 90% of what we just did/talked about.

    Prompt the agent to ask questions/present multiple solutions
I find that only helps marginally. They all output so much text it's not even funny. And that's with one "solution".

I don't get how people can stand reading all that nonsense they spew, especially Claude. Everything is insta-ready to deploy, problem solved, root cause found, go hit the big red button that might destroy the earth in a mushroom cloud. I learned real fast to only skim what it says and ignore all that crap (as in I never tried to "change its personality" for real - I did try to tell it to always use the scientific method and prove its assumptions but just like a junior dev it never does and just tells me stupid things it believes to be true and I have to question it. Again, just like a junior dev, but it's my junior dev that's always on and available when I have time and it does things while I do other stuff. And instead of me having to ask the junior after and hour or two what rabbit hole it went down and get them out of there, Claude and Codex usually visually ping the terminal before I even have time to notice. That's for when I don't have full time focus on what I'm trying to do with the agents, which is why I do like using them.

The times when I am fully attentive, they're just soooo slow. And many many times I could do what they're doing faster or just as fast but without spending extra money and "environment". I've been trying to "only use AI agents for coding" for like a month or two now to see its positives and limitations and form my own opinion(s).

    Prompting the agent to ask questions/present multiple solutions allows me to stay "in control" over the how something is built.
I find Claude's "Plan mode" is actually ideal. I just enable it and I don't have to tell it anything. While Codex "breaks out" from time to time and just starts coding even when I just ask it a question. If these machines ever take over, there's probably some record of me swearing at them and I will get a hitman on me. Unlike junior devs, I have no qualms about telling a model that it again ignored everything I told it.

    Ideally a 3rd party "agent" would handle this.
With sub-agents you can. Simple git interactions are perfect for subagents because not much can get lost in translation in the interface between the main agent and the sub agent. Then again, I'm not sure how you loose that much context. I rather use a sub agent for things like running the tests and linter on the whole project in the final steps, which spew a lot of unnecessary output.

Personally, I had a rather bad set of experiences with it controlling git without oversight, so I do that myself, since doing it myself is less taxing than approving everything it wants to do (I automatically allow Claude certain commands that are read only for investigations and reviewing things).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: