AI can code, but it can't build software

251 points by nreece a day ago

OptionOfT 8 minutes ago

AI can produce code that looks like patterns which it has seen as part of its training data.

It can recognize patterns in the codebase it is looking at and extrapolate from that.

Which is why generated code is filled with comments most often seen in either tutorial level code or JavaScript (explaining the types of values).

Beyond that performance drops rapidly, and hallucinations go up inversely.

simonw a day ago

This is a good headline. LLMs are remarkably good at writing code. Writing code isn't the same thing as delivering working software.

A human expert needs to identify the need for software, decide what the software should do, figure out what's feasible to deliver, build the first version (AI can help a bunch here), evaluate what they've built, show it to users, talk to them about whether it's fit for purpose, iterate based on their feedback, deploy and communicate the value of the software, and manage its existence and continued evolution in the future.

Some of that stuff can be handled by non-developer humans working with LLMs, but a human expert needs who understands code will be able to do this stuff a whole lot more effectively.

I guess the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers, or if programmers can pick up enough enough PM skills to work without PMs.

My money is on both roles continuing to exist and benefit from each other, in a partnership that produces results a lot faster because the previously slow "writing the code" part is a lot faster than it used to be.

prmph a day ago

> LLMs are remarkably good at writing code.
Just this past weekend, I've designed and written code (in Typescript) that I don't think LLMs can even come close to writing in years. I have a subscription to a frontier LLM, but lately I find myself using like 25% of the time.
At a certain level the software architecture problems I'm solving, drawing upon decades of understanding about maintainable, performant, and verifiable design of data structures and types and algorithms, are things LLMs cannot even begin to grasp.
At that point, I find that attempting to use an LLM to even draft an initial solution is a waste of time. At best I can use it for initial brainstorming.
The people saying LLMs can code are hard for me to understand. They are good for simple bash scripts and complex refactoring and drafting basic code idioms and that's about it.
And even for these tasks the amount of hand-holding I need to do is substantial. At least Gemini Pro/CLI seems good at one-shot performance, before its context gets poisoned
- Aperocky 18 hours ago
  
  I found that mastering LLM is no less complex than getting to learning a new language, probably between python and C++ in terms of mastery.
  The learning curve is very different - with other languages, the learning curve is often upfront, with LLM, it seems linear/even rear loaded, maybe because I've not gotten to the other side.
  I've been able to make LLM do more and more, some of it is undoubtly due to the improvement in model, but most of it is probably paradigm and changes in my approach. At the beginning, I run into all of the same complaints that I have eventually found workarounds to many.
- jcelerier 20 hours ago
  
  > The people saying LLMs can code are hard for me to understand. They are good for simple bash scripts and complex refactoring and drafting basic code idioms and that's about it
  that's like, 90% of the code people are writing
  - FromTheFirstIn 19 hours ago
    
    But not 90% of the work people do. It’s solved a task, not a problem.
    
    lan321 13 hours ago
    
    It's what takes time though. When you need to make a wrapper for some API for example LLMs are incredible. You give it a template, the payload format and the possible methods and it just spits out a 500-1000 line class in 15 seconds. Do it for 20 classes, that's work for a week 'done' in 30 mins. Realistically 2 days since you still have to fix and test a lot but still..
    
    skydhash 12 hours ago
    
    Or write a lisp macro in one hour and be done. Or install an opengenerator and be done in 10 minutes, 9 of which is configuring the generator.
    
    theshrike79 25 minutes ago
    
    Can a Lisp macro automatically search for, and find, the API documentation and apply it to the output?
    I've implemented connections to (public) APIs of different services multiple times using LLMs without even looking up the APIs myself.
    I just say "Enrich the data about this game from Steam's API" and that's about it.
    
    lan321 12 hours ago
    
    If you can get the specific documentation for it. Sadly many companies don't want you using the API so they just give you a generic payload and the methods and leave you to it. LLMs are good in the sense that they can tell what type StartDate, EndDate is (str MSDate), maybe it also somehow catches on that ActualDuration is an int.. It also manages to guess correctly a lot of the fields in that payload that are not necessary for the particular call/get overridden anyway.
- airstrike a day ago
  
  I find LLMs most helpful when I already have half of the answer written and need them to fill in the blanks.
  "Take X and Y I've written before, some documentation for Z, an example W from that repo, now smash them together and build the thing I need"
  - martin1027 17 hours ago
    
    This is so true. I've had the same experience.
- latentsea 19 hours ago
  
  I think C# is really going to shine in the LLM coding era. You can write Roslyn Analyzers to fail the build on arbitrary conditions after inspecting the AST. LLMs are great at helping you write these too. If you get a solid architecture well defined you can then use these as guardrails to constrain development to only happen in the manner you intend. You can then get LLMs to implement features and guarantee the code comes out in the shape you expect it to.
  This works well for humans too, but custom analysers are abstract and not many devs know how to write them, so they are mostly provided by library authors. However, being able to generate them via LLMs makes them so much more accessible, and IMHO is a game changer for enforcing an architecture.
  I've been exploring this direction a lot lately, and it feels very promising.
  - raddan 19 hours ago
    
    Can you expand a little? What you’re suggesting sounds a bit like program verification, or at least program analysis. But what properties are you checking?
    I have written many program analyses (though never any for C#; I’ll have to check it out), and my experience is that they are quite challenging to write. Many are research-level CS, so well outside the skill set of your average vibe coder. I’m wondering if you have some insight about LLM generated code that has not occurred to me…
  - torginus 13 hours ago
    
    I do quite a bit of coding in C#, and have a lot of experience, and personally I haven't found LLMs to be that great a help at writing C#.
    First, LLMs are great at learning new tech stacks, but good ol' ASP.NET has been pretty much stable since forever. Second, I think Rider/Resharper is the greatest piece of autocomplete tech ever made, seriously nothing ever comes, close, which means I'd rather do a refactor using them than do something similar by prompting the AI and hoping for the best. Also probably my experience makes me far less accepting of LLMisms, but that might just be on me.
    Lastly, AI seems to be focused around its own set of tooling, like Cursor, which is fine for TS but is far worse than Rider for things like C#. I know I could kludge things together, but still.
    As for Roslyn...
    I have some experience writing codegen/analyzers at my company and it feels like typical a Microsoft tech product, like WPF or Powershell.
    Brilliant idea (that's a market first as well) combined with really solid technical fundamentals, but plain confusing and overcomplicated UX, that makes it a chore to use. Seriously the amount of scaffolding you need to make even for a simple analyzer is just nuts
    
    theshrike79 17 minutes ago
    
    > Lastly, AI seems to be focused around its own set of tooling, like Cursor
    Nah, the best coding LLMs are console applications like Claude Code, Codex CLI and the like.
    Editor integration mostly brings more tools, like tapping into different validators on VSCode and examining the "problems" view.
    Also Rider's autocomplete is at least partially AI powered unless you specifically disable it IIRC.
    
    pjmlp 12 hours ago
    
    Why I never bothered writing one is the scaffolding, and the dumb idea to write code with WriteLines instead of a nice experience like T4 templates.
  - aitchnyu 17 hours ago
    
    I'm looking at AST based tools in Python to lint, enforce modularity, ban certain patterns. LLMs allow me to write scripts to find recursive function calls, calling super().method() in overrides etc.
    https://pypi.org/project/import-linter/ https://github.com/hchasestevens/astpath
  - KurSix 8 hours ago
    
    Linters are great at catching specific pattern violations, but they’re useless against bad decomposition or a poorly chosen abstraction. An LLM can generate code that passes all 100 linters and still ends up being a logical mess - with business logic in the wrong layer and completely unmaintainable.
    
    prmph 3 hours ago
    
    Exactly this. LLMs may do a passable job of architecture if there are many examples of high quality architecture similar to what you want to do in their training set, but to introduce some novel stuff and they are clueless
  - bgrainger 19 hours ago
    
    Completely agree, and I've started writing more Roslyn analyzers to provide quick feedback to the LLM (assuming you're using it in something like VS Code that exposes the `problems` tool to the model).
    I also want C# semantics even more closely integrated with the LLM. I'm imagining a stronger version of Structured Model Outputs that knows all the valid tokens that could be generated following a "." (including instance methods, extension properties, etc.) and prevents invalid code from even being generated in the first place, rather than needing a roundtrip through a Roslyn analyzer or the compiler to feed more text back to the model. (Perhaps there's some leeway to allow calls to not-yet-written methods to be generated.) Or maybe this idea is just a crutch I'm inventing for current frontier models and future models will be smart enough that they don't need it?
  - pjmlp 12 hours ago
    
    As someone whose C# is one of the main work ecosystems, I highly doubt it.
    What I am seeing it that LLMs will push current programming languages down the stack, like now you're enjoying C# => MSIL => Machine code.
    On my line of work I already can imagine the other side of the tunnel, more low-code/no-code tooling, orchestration agents, and much (much) less manually writing C#, Java and TypeScript.
  - seanmcdirmid 18 hours ago
    
    Library authors don’t really provide custom analyzers. heck, the best we can hope for are some regex based linting rules, anything that involves local data flow analysis is very rare, and anything inter procedural is non-existent. Program analysis is a dark hole, you are better off just making stronger type systems, but then type inference starts to bite you if you want to support it (and you will given how annoying type annotations are to write, unless you go with something simple like a purely structural type system so you can use Hindley Milner).
- saint-evan 8 hours ago
  
  Maybe if you mentioned a more complex, lower level or niche language than typescript like maybe C, MIPS or some niche exotic systems language pushing around registers. I'd believe yu, with caveat, but with abstract high level abstract languages like Python, typescript and the likes? It's highly unlikely that you would've put together syntax in any uniquely surprising combination. Maybe yu mean yu designed a clever fix to a problem within a larger codebase so thar would mean a context/attention issue for the LLM but there's no way in hell yu wrote up a contained piece of code solving a specific problem, not tied to a larger software env, that couldn't also have been written by frontier LLMs provided yu could articulate the problem, a course-of-action and expected output/behavior. LLMs are very good at writing code in isolation, humans still have deeper intuition and we're still extremely good at doing the plug-in, wiring and big picture planning. Yu over-estimate what you've done with typescript or misunderstand what 'LLMs are good at writing code' [in isolation] means
  - prmph 7 hours ago
    
    This is a weird take. Software engineering solving and design is not about of syntax at all. Syntax can help or hinder some ways of expressing things, but the result of the design process is not clever syntax.
    For example, the new shortest path algorithm that eclipses Dijkstra's is conceptual advance; it can be written in any Turing-complete language, and it's discovery had nothing to do with inventing new syntax in any specific language.
    You comment betrays the literal/concrete understanding of coding that is a hallmark of novices. It's like saying as long as LLMs can write any kind of musical notation, there is no way a human can be a better composer.
    I have not said an LLM cannot the same syntax or code patterns I write; I'm saying it, for instance, is poor at figuring out stuff like: How do I write types to enforce which entities and which fields and which roles are allowed for this action at compile-time? Should I use a generator, iterator, or recursive function for such and such functionality? Should this function be generic or not? How do I design my query fluent interface for the best performance? What should be the folder organization for this module that makes it intuitive to navigate and maintain? What is the best name for that function that will make it most intuitive to use? etc.
    Anyone saying such concerns have anything to do with whether I'm using Typescript vs C or Haskell does not understand software engineering.
- CjHuber 21 hours ago
  
  Can you maybe give an example you’ve encountered of an algorithm or a data structure that LLMs cannot handle well?
  In my experience implementing algorithms from a good comprehensive description and keeping track of data models is where they shine the most.
  - prmph 21 hours ago
    
    Example (expanding on [1]): I want to design a strongly typed fluent API interface to some role/permissions based authorization engine functionality. Even knowing how to shape the fluent interface so that is powerful but intuitive, as strongly typed as possible but also and maintainable, is a deep art.
    One reason I know LLM can't come close to my design is this: I've written something that works (that a typical senior engineer might write), but this not enough. I have evaluated it critically (drawing on my experience with long lived software), rewritten it again to better meet the targets above, and repeated this process several times. I don't know what would make an LLM go: now that kind of works, but is this the most intuitive, well typed, and maintainable design that there could be?
    1. https://news.ycombinator.com/item?id=45728183
    
    simonw 20 hours ago
    
    Funny you should use role/permissions as an example here, I spent the weekend using Claude Code to rewrite my own permissions engine to a new design that uses SQL queries to solve the problem "list all of the resources that this actor can perform this action on".
    My previous design required looping through all known resources asking "can actor X action Y on this?". The new design gets to generate a very complex by thoroughly tested SQL query instead.
    Applying that new design and updating the hundred of related tests would have taken me weeks. I got it done in two days.
    Here's a diff that captures most of the work: https://github.com/simonw/datasette/compare/e951f7e81f038e43...
    
    YZF 20 hours ago
    
    What % of the total amount of software (lessay lines of code or time invested) in the world is like that?
    
    realusername 15 hours ago
    
    Working on permissions for a large saas app, I can also confirm that the best LLM of the market have maybe a 10% success rate writing code in this area.
  - manwe150 21 hours ago
    
    Converting an algorithm implementation from recursive to iterative: it got the concept broadly right, but was quite bad at making the logic actually match up, often refusing to fix mistakes or reverting fixes two edits later. Still a positive experience though, since it was fixable issues and reduced the amount of tedious copies I had to type
    
    cadamsdotcom 21 hours ago
    
    Did you have it write tests and give it the ability to iterate & validate its implementation without you in the loop?
    Anything less is setting it up for failure...
    
    manwe150 21 hours ago
    
    Yes, but it got 99% of those then got stuck on why the others made no sense to it
    
    cadamsdotcom 13 hours ago
    
    It’s important to understand the tests it’s written yourself.
    If you’d like some help I’d be glad to, just drop me an email.
    My email’s in my profile.
  - herbst 13 hours ago
    
    Claude added a self re-calling timeout to my Typescript game loop to track time. Manually by adding 1000ms every time it's called.
    I removed it and it later just added it again.
    It's this small weird things where it can mess up a lot of code.
  - NicoJuicy 20 hours ago
    
    There are severe edge cases. Here are some of the last days.
    Eg. Just updating bootstrap to angular bootstrap. It didn't transfer how I placed the dropdowns ( basically using dropdown-end). So everything was out of view in desktop and mobile.
    It forgot the transloco I used everywhere and just used default English ( happens a lot).
    Suggested code that fixed 1 bug ( expression property recursion), but now linq to SQL was broken.
    Upgrade to angular 17 in a asp.net core app. I knew it used vite now. But it also required a browser folder to deploy. 20 changes down the road, I noticed something on my ui wasn't updated in dev ( fast commits for my side project, I don't build locally), it didn't deploy anything related to angular no more...
    I had 2 files named ApplicationDbContext and it took the one from wrong monolith module.
    It adds files in the wrong directory sometimes. Eg. Some modules were made with feature folders.
    It sometimes forgets to update my ocelot gateway or updates the compressed version. ...
    Note: I documented my architecture in eg. cline. But I use multiple agents to experiment with.
    Tldr: it's an expert beginner programmer.
    
    simonw 19 hours ago
    
    Do you have any automated tests for that project?
    I'm bringing to suspect a lot of my great experiences with coding agents come from the fact that they can run tests to confirm they haven't broken anything.
    
    solumunus 14 hours ago
    
    The test loop is integral.
    It’s kind of annoying hearing all this skepticism from people putting in the least effort into optimally using the tool. There is a learning curve. Every month I’ve gotten better results than the last because I’m constantly context building and refining, understanding how, what and when to prompt.
    It’s like hearing someone say database suck but they haven’t bothered to learn about or use indexes or foreign keys.
    
    NicoJuicy 11 hours ago
    
    Most of the mentioned issues wouldn't be catched by a test loop unless you have 100% automated tests (unit tests, ...)
    Which isn't always plausible ( time ). The AI makes makes different mistakes than humans that are sometimes harder to catch.
    
    simonw 7 hours ago
    
    It's a lot more plausible now you can get LLMs to help write those tests in the first place.
    
    NicoJuicy 15 hours ago
    
    Too little.
    Things moved as fast as possible to migrate from .net framework to .net core 8, angular 8 to 18 and bootstrap 4.5 to 5.x
- crazygringo 21 hours ago
  
  > The people saying LLM can code are hard for me to understand.
  Just today, I spent an hour documenting a function that performs a set of complex scientific simulations. Defined the function input structure, the outputs, and put a bunch of references in the body to function calls it would use.
  I then spent 15 minutes explaining to the free version of ChatGPT what the function needs to do both in scientific terms and in computer architecture terms (e.g. what needed to be separated out for unit tests). Then it asked me to answer ~15 questions it had (most were yes/no, it took about 5 min), then it output around 700 lines of code.
  It took me about 5 minutes to get it working, since it had a few typos. It ran.
  Then I spent another 15 minutes laying out all the categories of unit tests and sanity tests I wanted it to write. It produced ~1500 lines of tests. It took me half an hour to read through them all, adjusting some edge cases that didn't make sense to me and adjusting the code accordingly. And a couple cases where it was testing the right part of the code, but had made valiant but wrong guesses as to what the scientifically correct answer would be. All the tests then passed.
  All in all, a little over two hours. And it ran perfectly. In contrast, writing the code and tests myself entirely by hand would have taken at least a couple of entire days.
  So when you say they're good for those simple things you list and "that's about it", I couldn't disagree more. In fact, I find myself relying on them more and more for the hardest scientific and algorithmic programming, when I provide the design and the code is relatively self-contained and tests can ensure correctness. I do the thinking, it does the coding.
  - DougWebb 21 hours ago
    
    > Just today, I spent an hour documenting a function that performs a set of complex scientific simulations. Defined the function input structure, the outputs, and put a bunch of references in the body to function calls it would use.
    So that's... math. A very well defined problem, defined very well. Any decent programmer should be able to produce working software from that, and it's great that ChatGPT was able to help you get it done much faster than you could have done it yourself. That's also the kind of project that's very well suited for unit testing, because again: math. Functions with well defined inputs, outputs, and no side-effects.
    Only a tiny subset of software development projects are like that though.
    
    simonw 21 hours ago
    
    > Only a tiny subset of software development projects are like that though.
    Right: the majority of software development is things like "build a REST API for these three database tables" or "build a contact form with these four fields" or "write unit tests for this new function" or "update my YAML CI configuration to run this extra command".
    
    skydhash 12 hours ago
    
    You do know that system programming is a thing? Or that desktop applications are software too?
    
    simonw 8 hours ago
    
    I said "the majority of software development". Those are both relatively niche disciplines in 2025.
    
    somebehemoth 8 hours ago
    
    Can you please explain? Are you saying all software development outside of the web is "niche"?
    
    theshrike79 11 minutes ago
    
    Niche as in for every one systems programmer there are dozens of people writing API Glue.
    By hours of work spent and lines of code produced the latter is in a whole different scale than systems programmers (which is a very badly designed term anyway).
    
    simonw 7 hours ago
    
    Not necessarily niche, but less common. Take a look at the JetBrains developer survey if you want some numbers: https://www.jetbrains.com/lp/devecosystem-2024/
  - prmph 21 hours ago
    
    > documenting a function that performs a set of complex scientific simulations.
    The example you gave sounds like the problem is deterministic, even if composed of many moving parts. That's one way of looking at complexity.
    When I talk about complex problems I'm not just talking about intricate problems. I'm talking about problems where the "problem" is design, not just implementing a design, and that is where LLMs struggle a lot.
    Example, I want to design a strongly typed fluent API interface to some functionality. Even knowing how to shape the fluent interface so that is powerful, intuitive, well/strongly typed, and maintainable is a deep art.
    The intuitive design constraints that I'm designing under would be hard to even explain to an LLM.
    
    simonw 21 hours ago
    
    For the problems like that I consider my role to be the expert designer. I figure out a the design, then get the LLM to write the code and the tests for me.
    It is a lot faster at typing than I am.
- ratatougi 17 hours ago
  
  Agreed-I often use it when I need to brainstorm which appoarch I should take for my task, or when I need a refactor or generate a large set of mock data.
- solumunus 14 hours ago
  
  That amazing code you’ve written is a tiny proportion of code that’s needed to provide business value. Most of the code delivering business value to customers day in, day out is quite simple and can easily be LLM driven.
- veegee a day ago
  
  [flagged]
  - avgDev 21 hours ago
    
    This is an unhinged comment. You should take a deep breath and get off the internet. You sound extremely immature calling someone on HN "script kiddie".
    
    veegee 6 hours ago
    
    You’re right, nobody on here is a script kiddie. They’re all professionals making buttons change colours with JavaScript.
  - wutwutwat 21 hours ago
    
    What do you plan to do after your software career is over?
roxolotl a day ago

One of the interesting corollaries of the title is that this can also be true of humans. Being able to code is not the same as being a software engineer. It never has been.
- bloppe 21 hours ago
  
  At least you can teach a human to become a software engineer.
- echelon 21 hours ago
  
  We're also finding this true with media generation.
  AI video is an incredible tool, but it can't make movies.
  It's almost as if all of these models are an exoskeleton for people that already know what they're doing. But you still need an expert in the loop.
  - falcor84 21 hours ago
    
    > but it can't make movies.
    To me this appears to be a very time-dependent assertion. 5 years ago, AI couldn't generate a good movie frame. 2 years ago, AI couldn't generate a good shot, but now in 2025, AI can generate a not-too-shabby scene. If capabilities continue improving at this rate (e.g. as they have with AI being able to generate full musical albums), I wouldn't bet against AI being able to generate a decent feature film in the next decade. It might take longer until it's the sort of thing that we'd present in festivals, but I just don't a clear barrier any more.
    Looking at it from another perspective, if an AI driven task currently requires "an expert in the loop" to navigate things by offering the appropriate prompts, evaluating and iterating on the AI generated content, then there's nothing clear to stop us from training the next generation of AI to include that expert's competency.
    Taking it into full extrapolation mode, the thing that current generation AIs really don't have is the human experience that leads to a creative drive, but once we have robotic agents among us, these would arguably be able start gathering "experiences" that they could then mine to write and produce "their own" stories.
  - kujjerl7 20 hours ago
    
    >it can't make movies
    Humans are sharply declining in this ability at the same time. Most of what Hollywood churns out now is superhero slop, forced-diversity spin-offs, awful remakes of classics, and awkward comebacks for yesteryear's leading men.
    I know it's not a movie but I could've happily watched "Nothing, Forever" for the rest of my life. That was creative, chaotic, hilarious, and wildly entertaining.
    Meanwhile I watched the human-created War Of The Worlds (2025) last weekend... The less said, the better.
jfim a day ago

> I guess the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers
I'd argue that they can't, at least on a short timeframe. Not because LLMs can't generate a program or product that works, but that there needs to be enough understanding of how the implementation works to fix any complex issues that come up.
One experience I had is that I had tried to generate a MITM HTTPS proxy that uses Netty using Claude, and while it generated a pile of code that looked good on the surface, it didn't actually work. Not knowing enough about Netty, I wasn't able to debug why it didn't work and trying to fix it with the LLM didn't help either.
Maybe PMs can pick up enough knowledge over time to be able to implement products that can scale, but by that time they'd effectively be a software engineer, minus the writing code part.
- ambicapter a day ago
  
  LLMs are great for learning though, you can easily ask them questions, and you can evaluate your understanding every step of the way, and gradually build the accuracy of your world model that way. It’s not uncommon for me to ask a general question, drill deeper into a concept, and then either test things manually with some toy code or end up reading the official documentation, this time with at least some exposure to the words that I’m looking for to answer my question.
  - sodaclean a day ago
    
    This is how I use them- but I also use them to write initial UI's (usually very primitive). Because I've got an issue where the UI has to be perfect, and if I can blame somebody/something other than me I can ignore it until the UI becomes important enough.
  - o11c a day ago
    
    If I wanted a confident and simple answer with no regard for veracity, I would just ask a politician.
- kaashif 19 hours ago
  
  If an LLM can get you 90% of the way there, you need fewer engineers. But the engineer you need probably needs to be a senior engineer who went through the pain of learning all of the details and can function without AI.
  If all juniors are using AI, or even worse, no juniors are ever hired, I'm not sure how we can produce those seniors at the scale we currently do. Which isn't even that large a scale.
Bukhmanizer 21 hours ago

> the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers
I have a strong opinion that AI will boost the importance of people with “special knowledge” more than anyone else regardless of role. So engineers with deep knowledge of a system or PMs with deep knowledge of a domain.
- simonw 20 hours ago
  
  That sounds right to me.
samsolomon 21 hours ago

I think you're right, the roles will exist for some time. But I think we'll start to see more and more overlap between engineering, product management and design.
In a lot of ways I think that will lead to stronger delivery teams. As a designer—the best performing teams I've been on have individuals with a core competency, but a lot of overlap in other areas. Product managers with strong engineering instincts, engineers with strong design instincts, etc. When there is less ambiguity in communication, teams deliver better software.
Longer-term I'm unsure. Maybe there is some sort of fusion into all-purpose product people able to do everything?
shalmanese 13 hours ago

It’s worthwhile reading the original Fred Brooks “No Silver Bullets” paper where they explicitly cover LLMs under their “Hopes for the Silver” AI/Expert Systems/Automatic programming section and explain why it is still not a silver bullet.
https://worrydream.com/refs/Brooks_1986_-_No_Silver_Bullet.p...
vrc 19 hours ago

I’m a PM and I’ve been able to do a lot of very interesting near production ready bits of coding recently with an LLM. I say near production ready because I specifically only build functional data processing stuff that I intentionally build with clean I/O requirements to hand to the real engineers on the team to slot in. They still have to fix some things to meet our standards, but I’m basically a “researcher” level coder. Which makes sense — I do have an undergrad and MS in CS, and did a lot of mathy algo stuff. For the last 15+ years I could never use anything in my brain to help the team solve things I was best suited to solve. I am now, and that’s nice.
The one key point is that I am keenly aware of what I can and cannot do. With these new superpowers, I often catch myself doing too much, and I end up doing a lot more rewrites than a real engineer would. But I can see Dunning Kruger playing out everywhere when people say they can vibe code an entire product.
belZaah 16 hours ago

Yeah, no. Had Claude 4.5 generate a mock implementation of an OpenAPI spec. Trivial interaction, just a post of a json object. And Claude invented new fields to check for and failed to check for required ones.
It is helpful in reducing the number of keys I have to press and the amount of documentation-diving I need to do. But saying that’s writing code is like saying StackOverflow is writing code along with autocomplete.
- simonw 13 hours ago
  
  What did Claude do when you replied and said "don't add new fields, and make sure you check the required ones"?
  - Balinares 11 hours ago
    
    "You're absolutely right!"
kakacik 15 hours ago

Not happening anytime soon. Those product management types are more expensive than devs in most places, you would be literally a) increasing cost per hour worked; and b) stiffling the use of (pricey) management skills of such manager to do lower pay job.
I have no doubt some broken places end up in similar mode but en masse it doesnt make any financial sense.
Also when SHTF and you can't avoid going into deep debug with strong management pressure and oversight, it will become glaringly obvious which approach can maintain things running. And SHTF always happens, its only a function of time.
colordrops a day ago

Once all the context that a typical human engineer has to "build software" is available to the LLM, I'm not so sure that this statement will hold true.
- bloppe 21 hours ago
  
  But it's becoming increasingly clear that LLMs based on the transformer model will never be able to scale their context much further than the current frontier, due mainly to context rot. Taking advantage of greater context will require architectural breakthroughs.
IanCal 21 hours ago

I disagree. Unless you’re focussed on right now, in which case case… maybe? Depends on scale.
I have a few scattered thoughts here but I think you’re caught up on how things are done now.
A human expert in a field is the customer.
Do you think, say, gpt5 pro can’t talk to them about a problem and what’s reasonable to try and build in software?
It can build a thing, with tests, run stuff and return to a user.
It can take feedback (talking to people is the key major things LLMs have solved).
They can iterate (see: codex) deploy and they can absolutely write copy.
What do you really think in this list they can’t do?
For simplicity reduce it to a relatively basic crud app. We know that they can make these over several steps. We know they can manage the ui pretty well, do incremental work etc. What’s missing?
I think something huge here is that some of the software engineering roles and management become exceptionally fast and cheap. That means you don’t need to have as many users to be worthwhile writing code to solve a problem. Entirely personal software becomes economically viable. I don’t need to communicate value for the problem my app has solved because it’s solved it for me.
Frankly most of the “AI can’t ever do my thing” comments come across as the same as “nobody can estimate my tasks they’re so unique” we see every time something comes up about planning. Most business relevant SE isn’t complex logically, interestingly unique or frankly hard. It’s just a different language to speak.
Disclaimer: a client of mine is working on making software simpler to build and I’m looking at the AI side, but I have these views regardless.
- simonw 20 hours ago
  
  I expect that customers who have those needs would much rather hire somebody to be the intermediary with the LLM writing the code than take on that role themselves.
  You'll get the occasional high agency non-technical customer who decides to learn how to get these things done with LLMs but they'll be a pretty rare breed.
  - IanCal 20 hours ago
    
    This may be a timeframe issue but I sincerely doubt anyone wants to hire someone to be an intermediary. They just want the thing done.
    I know that right now few want to sit in front of claude code, but it's just not that big of a leap to move this up a layer. Workflows do this even without the models getting better.
    
    simonw 19 hours ago
    
    YouTube can show anyone how to unblock a sink. Most people still choose to call a plumber.
    
    IanCal 6 hours ago
    
    Most people would probably not do that if they could just say “unblock the sink” into their phone.

jumploops a day ago

I've been forcing myself to "pure vibe-code" on a few projects, where I don't read a single line of code (even the diffs in codex/claude code).

Candidly, it's awful. There are countless situations where it would be faster for me to edit the file directly (CSS, I'm looking at you!).

With that said, I've been surprised at how far the coding agents are able to go[0], and a lot less surprised about where I need to step in.

Things that seem to help: 1. Always create a plan/debug markdown file 2. Prompt the agent to ask questions/present multiple solutions 3. Use git more than normal (squash ugly commits on merge)

Planning is key to avoid half-brained solutions, but having "specs" for debug is almost more important. The LLM will happily dive down a path of editing as few files as possible to fix the bug/error/etc. This, unchecked, can often lead to very messy code.

Prompting the agent to ask questions/present multiple solutions allows me to stay "in control" over the how something is built.

I now basically commit every time a plan or debug step is complete. I've tried having the LLM control git, but I feel that it eats into the context a bit too much. Ideally a 3rd party "agent" would handle this.

The last thing I'll mention is that Claude Code (Sonnet 4.5) is still very token-happy, in that it eagerly goes above and beyond when not always necessary. Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault. For both cases, this is where planning up-front is super useful.

[0]Caveat: the projects are either Typescript web apps or Rust utilities, can't speak to performance on other languages/domains.

theshrike79 7 minutes ago

Sonnet 4.5 is rebranded Opus 4. That's where it got its token-happiness.
Try asking Opus to generate a simple application and it'll do it. It'll also add thousands of lines of setup scripts and migration systems and Dockerfiles and reports about how it built everything and... Ooof.
Sonnet 4.5 is the same, but at a slightly smaller scale. It still LOVES to generate markdown reports of features it did. No clue why, but by default it's on, you need to specifically tell it to stop doing that.
svachalek 21 hours ago

Also, put heavy lint rules in place, and commit hooks to make sure everything compiles, lints, passes tests, etc. You've got to be super, super defensive. But Claude Code will see all those barriers and respond to them automatically which saves you the trouble of being vigilant over so many little things. You just need to watch the big picture, like make sure tests are there to replicate bugs, new features are tested, etc, etc.
- theshrike79 6 minutes ago
  
  Same as when coding with humans, better tests and linters will give you a shorter and simpler iteration loop.
  LLMs love that.
asabla 15 hours ago

> The last thing I'll mention is that Claude Code (Sonnet 4.5) is still very token-happy, in that it eagerly goes above and beyond when not always necessary. Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault.
I very much share your experience. As for the time being I like the experience with codex over claude, just because I find my self in a position where I know much sooner when to step in and just doing it manually.
With claude I find my self in a typing exercise much more often, I could probably get better of knowing when to stop ofc.
enraged_camel 9 hours ago

>> Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault.
I've seriously tried gpt-5-codex at least two dozen times since it came out, and every single time it was either insufficient or made huge mistakes. Even with the "have another agent write the specs and then give it to codex to implement" approach, it's just not very good. It also stops after trying one thing and then says "I've tried X, tests still failing, next I will try Y" and it's just super annoying. Claude is really good at iterating until it solves the issue.
- jumploops 3 hours ago
  
  What type of codebase are you working within?
  I've spent quite a bit of time with the normal GPT-5 in Codex (med and high reasoning), so my perspective might be skewed!
  Oh, one other tip: Codex by default seems to read partial files (~200 lines at a time), so I make sure to add "Always read files in full" to my AGENTS.md file.
throwaway314155 a day ago

> Candidly, it's awful.
Noting your caveat but I’m doing this with Python and your experience is very different from mine.
- jumploops 19 hours ago
  
  Oh, don't get me wrong, the models are marvelous!
  The "it's awful" admission is due to the "don't look at code" aspect of this exercise.
  For real work, my split is more like 80% LLM/20% non-LLM, and I read all the code. It's much faster!
tharkun__ 21 hours ago
```
    Always create a plan/debug markdown file
```
Very much necessary. Especially with Claude I find. It auto-compacts so often (Sonnet 4.5) and it instantly goes a-wall stupid after that. I then make it re-read the markdown file, so we can actually continue without it forgetting about 90% of what we just did/talked about.
```
    Prompt the agent to ask questions/present multiple solutions
```
I find that only helps marginally. They all output so much text it's not even funny. And that's with one "solution".
I don't get how people can stand reading all that nonsense they spew, especially Claude. Everything is insta-ready to deploy, problem solved, root cause found, go hit the big red button that might destroy the earth in a mushroom cloud. I learned real fast to only skim what it says and ignore all that crap (as in I never tried to "change its personality" for real - I did try to tell it to always use the scientific method and prove its assumptions but just like a junior dev it never does and just tells me stupid things it believes to be true and I have to question it. Again, just like a junior dev, but it's my junior dev that's always on and available when I have time and it does things while I do other stuff. And instead of me having to ask the junior after and hour or two what rabbit hole it went down and get them out of there, Claude and Codex usually visually ping the terminal before I even have time to notice. That's for when I don't have full time focus on what I'm trying to do with the agents, which is why I do like using them.
The times when I am fully attentive, they're just soooo slow. And many many times I could do what they're doing faster or just as fast but without spending extra money and "environment". I've been trying to "only use AI agents for coding" for like a month or two now to see its positives and limitations and form my own opinion(s).
```
    Prompting the agent to ask questions/present multiple solutions allows me to stay "in control" over the how something is built.
```
I find Claude's "Plan mode" is actually ideal. I just enable it and I don't have to tell it anything. While Codex "breaks out" from time to time and just starts coding even when I just ask it a question. If these machines ever take over, there's probably some record of me swearing at them and I will get a hitman on me. Unlike junior devs, I have no qualms about telling a model that it again ignored everything I told it.
```
    Ideally a 3rd party "agent" would handle this.
```
With sub-agents you can. Simple git interactions are perfect for subagents because not much can get lost in translation in the interface between the main agent and the sub agent. Then again, I'm not sure how you loose that much context. I rather use a sub agent for things like running the tests and linter on the whole project in the final steps, which spew a lot of unnecessary output.
Personally, I had a rather bad set of experiences with it controlling git without oversight, so I do that myself, since doing it myself is less taxing than approving everything it wants to do (I automatically allow Claude certain commands that are read only for investigations and reviewing things).

pron a day ago

> I don’t really know why AI can't build software (for now)

Could be because programming involves:

1. Long chains of logical reasoning, and

2. Applying abstract principles in practice (in this case, "best practices" of software engineering).

I think LLMs are currently bad at both of these things. They may well be among the things LLMs are worst at atm.

Also, there should be a big asterisk next to "can write code". LLMs do often produce correct code of some size and of certain kinds, but they can also fail at that too frequently.

Kim_Bruning 8 hours ago

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

   --Charles Babbage

We have now come to the point where you CAN put in the wrong figures and sometimes the right answer comes out (possibly over half the time!). This was and is incredible to me and I feel lucky to be alive to see it.

However, people have taken that to mean that you can ask any old question any old way and have the right answer come out now. I might at one point have almost thought so myself. But LLMs currently are definitely not there yet.

Consider (eg) Claude Code to be your English SHell (Compare: zsh, bash).

Learn what it can and can't do for you. It's messier to learn than straight and/or/not; and I'm not sure there's manuals for it; and any manual will be outdated next quarter anyway; but that's the state of play at this time.

loco5niner 7 hours ago

Well, the right answers have been put in the knowledgebase. It's just that the prompt may be wrong.

orliesaurus a day ago

Software engineering has always been about managing complexity, not writing code. Code is just the artifact. No-code, low-code is all code but doesn't make for a good software engineered application

Calamityjanitor 21 hours ago

I feel you can apply this to all roles. When models passed highschool exam benchmarks, some people talked as if that made the model equivalent to a person passing highschool. I may be wrong, but I bet even an state of the art LLM couldn't complete high school. You have to do things like attending classes at the right time/place, take initiative, keep track of different classes. All of the bigger picture thinking and soft skills that aren't in a pure exam.

Improving this is what everyone's looking into now. Even larger models, context windows, adding reasoning, or something else might improve this one day.

takoid 20 hours ago

How would LLMs ever be able to attend classes at the right time/place, assuming the classes are in-person and not remote? Seems like an odd and irrelevant criticism.

subtlesoftware a day ago

True for now because models are mainly used to implement features / build small MVPs, which they’re quite good at.

The next step would be to have a model running continuously on a project with inputs from monitoring services, test coverage, product analytics, etc. Such an agent, powered by a sufficient model, could be considered an effective software engineer.

We’re not there today, but it doesn’t seem that far off.

bloppe a day ago

> We’re not there today, but it doesn’t seem that far off.
What time frame counts as "not that far off" to you?
If you tried to bet me that the market for talented software engineers would collapse within the next 10 years, I'd take it no question. 25 years, I think my odds are still better than yours. 50 years, I might not take the bet.
- subtlesoftware a day ago
  
  Great question. It depends on the product. For niche SaaS products, I’d say in the next few years. For like Amazon.com, on the order of decades.
  - bloppe 21 hours ago
    
    If the niche SaaS product never required a talented engineer in the first place, I'd be inclined to agree with you. But even a niche SaaS product requires a decent amount of engineering skill to maintain well.
thomasfromcdnjs a day ago

Agreed.
I've played around with agent only code bases (where I don't code at all), and had an agent hooked up to server logs, which would create an issue when it encounters errors, and then an agent would fix the tickets, push to prod and check deployment statuses etc. Worked good enough to see that this could easily become the future. (I also had it claude/codex code that whole setup)
Just for semantic nitpicking, I've zero shot heaps of small "software" projects that I use then throw away. Doesn't count as a SAAS product but I would still call it software.
- bloppe a day ago
  
  The article "AI can code, but it can't build software"
  An inevitable comment: "But I've seen AI code! So it must be able to build software"
bcrosby95 a day ago

> The next step would be to have a model running continuously on a project with inputs from monitoring services, test coverage, product analytics, etc. Such an agent, powered by a sufficient model, could be considered an effective software engineer.
Building an automated system that determines if a system is correct (whatever that means) is harder to build than the coding agents themselves.
pil0u a day ago

I agree that tooling is maturing towards that end.
I wonder if that same non-technical person that built the MVP with GenAI and requires a (human) technical assistance today, will need it tomorrow as well. Will the tooling be mature enough and lower the barrier enough for anyone to have a complete understanding about software engineering (monitoring services, test coverage, product analytics)?
- cratermoon 5 hours ago
  
  > I agree that tooling is maturing towards that end.
  That's what every no-programming-needed hyped tool has said. Yet here we are, still hiring programmers.
jahbrewski a day ago

I’ve heard “we’re not there today, but it doesn’t seem that far off” since the beginning of the AI infatuation. What if, it is far off?
- bloppe a day ago
  
  It's telling to me that nobody who actually works in AI research thinks that it's "not that far off".

KurSix 8 hours ago

This whole situation painfully reminds me of the low-code/no-code boom from like 5–10 years ago.

Back then everyone was saying developers would become obsolete and business analysts would just “click together” enterprise solutions. In the end, we got a mess of clunky non-scalable systems that still had to be fixed and integrated by the same engineers.

LLMs are basically low-code on steroids - they make it easier to build a prototype, but exponentially harder to turn it into something actually reliable.

aurintex 6 hours ago

This is a great read and something I've been grappling with myself.

I've found it takes significant time to find the right "mode" of working with AI. It's a constant balance between maintaining a high-level overview (the 'engineering' part) while still getting that velocity boost from the AI (the 'coding' part).

The real trap I've seen (and fallen into) is letting the AI just generate code at me. The "engineering" skill now seems to be more about ruthless pruning and knowing exactly what to ask, rather than just knowing how to write the boilerplate.

eterm a day ago

I've been experimenting with a little vibe coding.

I've generally found the quality of .NET to be quite good. It trips up sometimes when linters ping it for rules not normally enforced, but it does the job reasonably well.

The front-end javascript though? It's both an absolute genuis and a complete menace at the same time. It'll write reams of code to gets things just right but with no regards to human maintainability.

I lost an entire session to the fact that it cheerfully did:

    npm install fabric
    npm install -D @types/fabric

Now that might look fine, but a human would have realised that the typings library is a completely different out-dated API, the package last updated 6 years ago.

Claude however didn't realise this, and wrote a ton of code that would pass unit tests but fail the type check. It'd check the type checker, re-write it all to pass the type checker, only for it now to fail the unit tests.

Eventually it semi-gave up typing and did loads of (fabric as any) all over the place, so now it just gave runtime exceptions instead.

I intervened when I realised what it was doing, and found the root cause of it's problems.

It was a complete blindspot because it just trusted both the library and the typechecker.

So yeah, if you want to snipe a vibe coder, suggest installing fabricjs with typings!

KurSix 8 hours ago

You can take the git idea even further.
Instead of just committing more often, make the agent write commits following the conventional commits spec (feat:, fix:, refactor:) and reference a specific item from your plan.md in the commit body. That way you’ll get a self-documenting history - not just of the code, but of the agent’s thought process, which is priceless for debugging and refactoring later on
teaearlgraycold 21 hours ago

Although - at least for simple packages - I've found LLMs good at extracting type definitions from untyped libraries.

abhishekismdhn 20 hours ago

Even the code quality is often quite poor. At the same time, not using critical thinking can have serious consequences for those who treat AI as more than an explorer or companion. You might think that with AI, the number of highly skilled developers would increase but it could be quite the opposite. Code is just a medium; developers are paid to solve problems, not to write code. But writing code is still important as it refines your thoughts and sharpens your problem-solving skills.

The human brain learns through mistakes, repetition, breaking down complex problems into simpler parts, and reimagining ideas. The hippocampus naturally discards memories that aren’t strongly reinforced.. so if you rely solely on AI, you’re simply not going to remember much.

hamasho a day ago

The problem with vibe coding is it demoralizes experienced software engineers. I'm developing a MVP with vibes and Claude Code and Codex output work in many cases for this relatively new project. But the quality of code is bad. There is already duplicated or unused logic, a lot of code is unnecessarily complex (especially React and JSX). And there's little PR reviews so that "we can keep velocity". I'm paying much less attention for quality now. After all, why bother when AI produce working code? I can't justify and don't have energy for deep-diving system design or dozens of nitpicking change requests. And it makes me more and more replaceable by LLM.

bloppe a day ago

> I'm paying much less attention for quality now. After all, why bother when AI produce working code?
I hear this so much. It's almost like people think code quality is unrelated to how well the product works. As though you can have 1 without the other.
If your code quality is bad, your product will be bad. It may be good enough for a demo right now, but that doesn't mean it really "works".
- krackers 21 hours ago
  
  Because there's a notion that if any bugs are discovered later on, they can just "be fixed". And generally unless you're the one fixing the bugs, it's hard to understand the asymmetry in effort here. No one also ever got any credit for bug-fixes compared to adding features.
- hamasho 20 hours ago
  
  I know how important code quality is. But I can't (or don't have energy to) convince junior engineers and sometimes project managers to submit good quality code instead of vibe-coded garbage anymore.
  - bloppe 9 hours ago
    
    I just hope I never have to work at a company like that again
- carlosjobim 21 hours ago
  
  > If your code quality is bad, your product will be bad.
  Why? Modern hardware power allow for extremely inefficient code, so even if some code runs a thousand times slower because it's badly programmed it will still be so fast that it seems instant.
  For the rest of the stuff, it has no relevance for the user of the software what the code is doing inside of the chip, as long as the inputs and outputs function as they should. User wants to give input and receive output, nothing else has any significance at all for her.
  - bloppe 21 hours ago
    
    Sure. Everyone remembers from Algorithms 101 that a constant multiple ("a thousand times slower") is irrelevant. What matters is the scalability. Something that's O(n) will always scale better than something that O(n^2), even if the thing that's O(n) has 1000x overhead per unit.
    But that's just a small piece of the puzzle. I agree that the user only cares about what the product does and not how the product works, but the what is always related to how, even if that relationship is imperceptible to the user. A product with terrible code quality will have more frequent and longer outages (because debugging is harder), and it will take longer for new features to be added (because adding things is harder). The user will care about these things.
phyzome 20 hours ago

I find it fascinating that your reaction to that situation is to double down while my reaction would be to kill it with fire.

dreamcompiler 19 hours ago

I've worked in a few teams where some member of the [human] team could be described as "Joe can code, but he can't build software."

The difference is what we used to call the "ilities": Reliability, inhabitability, understandability, maintainability, securability, scalability, etc.

None of these things are about the primary function of the code, i.e. "it seems to work." In coding, "it seems to work" is good enough. In software engineering, it isn't.

bradfa a day ago

The context windows are still dramatically too small and the models aren’t yet seeming to train on how to build maintainable software. There is a lot less written down about how to do this on the public web. There’s a bunch of high level public writing but not may great examples of real world situations that happen on every proprietary software project, because that’s very messy data locked away internal to companies.

I’m sure it’ll improve over time but it won’t be nearly as easy as making ai good at coding.

ewoodrich 20 hours ago

> aren’t yet seeming to train on how to build maintainable software.
A while ago I discovered that Claude, left to its own devices, has been doing the LLM equivalent of Ctrl-C/Ctrl-V for almost every component it's created in an ever growing .NET/React/Typescript side project for months on end.
It was legitimately baffling seeing the degree to which it had avoided reusing literally any shared code in favor of updating the exact same thing in 19 places every time a color needed to be tweaked or something. The craziest example was a pretty central dashboard view with navigation tabs in a sidebar where it had been maintaining two almost identical implementations just to display a slightly different tab structure for logged in vs logged out users.
I've now been directing it to de-spaghetti things when I spot good opportunities and added more best practices to CLAUDE.md (with mixed results) so things are gradually getting more manageable, but it really shook my confidence in its ability to architect, well, anything on its own without micromanagement.
AnimalMuppet a day ago

In fairness, there's a lot more "software" than there is "maintainable software" in their training data...

Animats 20 hours ago

OK, he makes a statement, and then just stops.

In some ways, this seems backwards. Once you have a demo that does the right thing, you have a spec, of sorts, for what's supposed to happen. Automated tooling that takes you from demo to production ready ought to be possible. That's a well-understood task. In restricted domains, such as CRUD apps, it might be automated without "AI".

sothatsit 16 hours ago

I like to think of it like AI can code, but it is terrible at making design decisions.

Vibe-coded apps eventually fall over as they are overwhelmed by 101 bad architectural decisions stacked on top of one another. You need someone technical to make those decisions to avoid this fate.

aayushdutt 5 hours ago

It's just the frontier getting pushed slowly but surely. The headline missed the keyword `yet`.

gherkinnn 15 hours ago

It is only a matter of years for all the idea guys in my org to realise this.

"But AI can build this in 30min"

thegrim33 21 hours ago

And here I am, using AI twice within the last 12 hours, to ask it two questions about an extremely well used, extremely well documented, physics library, and both times having it return to me sample code which makes use of library methods which don't exist. When I tell it this, I get the "Oh, you're so right to point that out!" response, and get new code returned, which still just blatantly doesn't work.

drcxd 20 hours ago

Hello, have you ever tried using the coding agents?
For example, you can pull the library code to your working environment and install the coding agent there as well. Then you can ask them to read specific files, or even all files in the library. I believe (according to my personal experience) this would significantly decrease the possibility of hallucinating.

preommr a day ago

These discussions are so tiring.

Yes, they're bad now, but they'll get better in a year.

If the generative ability is good enough for small snippets of code, it's good enough for larger software that's better organized. Maybe the models don't have enough of the right kind of training data, or the agents don't have the right reasoning algorithms. But it is there.

phyzome 20 hours ago

I've been hearing "they'll be better in a few months/years" for a few years now.
- Esophagus4 20 hours ago
  
  But hasn’t the ecosystem as a whole been getting better? Maybe or maybe not on the models specifically, but ChatGPT came out and it could do some simple coding stuff. Then came Claude which could do some more coding stuff. Then Cursor and Cline, then reasoning models, then Claude Code, then MCPs, then agents, then…
  If we’re simply measuring model benchmarks, I don’t know if they’re much better than a few years ago… but if we’re looking at how applicable the tools are, I would say we’re leaps and bounds beyond where we were.
CivBase a day ago

Problem is, as the author points out, designing software solutions is a lot more complicated than writing code. AI might get better in a year, but when will it be good enough? Does our current approach to AI even produce an economical solution to this problem, even if it's technically possible?
gitaarik 16 hours ago

So what's your point exactly? That LLMs cán write software, just not yet?

zeckalpha 21 hours ago

I think this can be extended (but not necessarily fully mitigated) by working with non-SWE agents interacting with the same codebase. Drafting product requirements, assess business opportunities, etc. can be done by LLMs.

smugtrain 10 hours ago

Making it absolutely lovely for people who can build software, but can’t code

ruguo 20 hours ago

True. AI might not have a soul, but it’s become an absolute lifesaver for me.

To really get the most out of it though, you still need to have solid knowledge in your own field.

CMCDragonkai 21 hours ago

Many human devs can code, but few can build software.

gdulli 21 hours ago

It's the ultimate irony that I cling to the stance that humans are capable of nuance and creativity that machines will never match, yet the human-written defenses of AI are so repetitive and shallow and cliched that they don't even require the sophistication of LLMs to produce.
gitaarik 16 hours ago

But humans can learn. LLMs don't learn, they only get trained on data previously discovered through human research.

liqilin1567 19 hours ago

Every time I see a "build an app with just one English sentence" hype, I turn away immediately

xeckr 19 hours ago

Give it a year or two...

johnnienaked 17 hours ago

Quit saying AI can code. AI can't do anything that wasn't done by actual humans before. AI is a plagiarism machine.

jongjong 18 hours ago

I still can't believe my own eyes that when I show an LLM my codebase and I tell it what functionality I want to add in reasonable detail, it can produce perfect looking code that I could have written myself.

I would say that AI is better at coding than most developers. If I had the option to choose between a junior developer to assist me or Claude Code, I would choose Claude Code. That's a massive achievement. Cannot be understated.

It's a dream come true for someone with a focus on architecture like myself. The coding aspect was dragging me down. LLMs work beautifully with vanilla JavaScript. The combined ability to generate code quickly and then quickly test (no transpilation/bundling step) gives me fast iteration times. Add that to the fact that I have a minimalist coding style. I get really good bang for my bucks/tokens.

The situation is unfortunate for junior developers. That said, I don't think it necessarily means that juniors should abandon the profession; they just need to refocus their attention towards the things that AI cannot do well like spotting contradictions and making decisions. Many developers are currently not great at this; maybe that's the reason why LLMs (which are trained on average code) are not good at it either. Juniors have to think more critically than ever before; on the plus side, they are freed to think about things at a higher level of abstraction.

My observation is that LLMs are so far good news for neurodivergent developers. Bad news for developers who are overly mimetic in their thinking style and interests. You want to be different from the average developer whose code the LLM was trained on.

cdelsolar 20 hours ago

I definitely disagree. I'm a software engineer, but have been heavily using AI the last few months and have gotten multiple apps to production since then. I have to guide the LLM along, yes, but it's perfectly capable of doing everything needed up to and including building the cloudformation templates for Fargate or whatever.

aussieguy1234 20 hours ago

I'm of the opinion that not a single software engineer has yet lost their job to AI.

Any company claiming they've replaced engineers with AI has done so in an attempt to cover up the real reasons they've gotten rid of a few engineers. "AI automating our work" sounds much better to investors than "We overhired and have to downsize".

apical_dendrite a day ago

I've been working with a data processing pipeline that was vibe-coded by an AI engineer, and while the code works, as software that has to fit into a production environment, it's a mess. Take logging for example. The pipeline is made up of AWS lambdas written in python. The person who built it wanted to add context to each log for debugging and the LLM generated hundreds of lines of python in each lambda to do this (no common library). But he (and the LLM) didn't understand that there were a bunch of files that initialized their own loggers at the top of the file, so all that code to set context in the root logger wouldn't get used in those files. And then he wanted to parallelize some tasks, and both he and the LLM didn't understand that the logging context was thread-local and wouldn't show up in logs generated in another thread. So what we ended up with was 250+ line logging_config.py files in each individual lambda that were only used for a small portion of the logs generated by the application.

mrheosuper 19 hours ago

Does it work ?

orionblastar a day ago

I see so many people on the Internet who claim they can fix AI VIBE Code. Nothing new I've been Super Debugging crappy code for 30 years to make it work.

ergocoder 21 hours ago

Yeah, just like many software engineers. AI has achieved software engineering.

jongjong 17 hours ago

Software development is one of these things which often seems really easy from the outside but can be insanely complicated.

I had this experience with my co-founder where I was shipping features quickly and he got used to a certain pace of progress. Then we ended up with like 6 different ways to perform a particular process with some differences between them; I had reused as much code as possible; all passing through the same function but without tests, it became challenging to avoid bugs/regressions... My co-founder could not understand why I was pushing back on implementing a particular feature which seemed very simple to him at a glance.

He could not believe me why I was pushing back. Thought I was just being stubborn. I explained to him all the technical challenges involved and it took me like 30 minutes to explain (at a high level) all the technical considerations and trade-offs and how much complexity would be introduced by adding this new feature and he agreed with my point of view.

People who aren't used to building software cannot grasp the complexity. Beyond a certain point, it's like every time my co-founder asked me to do something related to a particular part of the code, I'd spend several minutes pointing out the logical contradictions in his own requirements. The non-technical person thinks about software development in a kind of magical way. They don't really understand what they're asking. This isn't even getting into the issue of technical constraints which is another layer.

ontouchstart 11 hours ago

I am in a position of implementation some complex features on top of a shaky foundation with vague requirements. It took a lot of thinking and iteration to figure out what we really wanted, needed and what is possible. And the consequences of the decision we made before, now and in the future.
I am “vibe” coding my way through but the real work is in my head, not in the Cursor IDE with Claude, unit tests, or live debugging. It was me who was learning, not the machine.

nsonha 17 hours ago

So are software engineers. Many can, but there is nothing in the definition of the "engineer" (software or otherwise) concept imply that they can build things.

ra0x3 17 hours ago

I have rarely in my 11+ years of professionally writing software, met someone who could _really_ "write code", but couldn't build software. Anecdotal obviously. But I'd say the opposite tends to be the case IMO - those who tend to really know "the code", also tend to know how to effectively build software (relatively speaking).
It kinda makes sense - "knowing how to code" in modern tech largely means "knowing how to build software" - not write single modules in some language - because those single modules on their own are largely useless outside the context of "software".

jongjong 18 hours ago

>> hey, I have this vibe-coded app, would you like to make it production-ready

This makes me cringe because it's a lot harder to get LLMs to generate good code when you start with a crappy codebase. If you start with a good codebase, it's like the codebase is coding itself. The former approach trying to get the LLM to write clean code is akin to mental torture, the second approach is highly pleasant.

asah 18 hours ago

FTFY: "for now"

tug2024 a day ago

[dead]

black_13 16 hours ago

[dead]