Why LLMs and AI Agents Won't Remove the Need for Software Engineers.
Removing software engineers from the software development loop ultimately requires ceding control of the detail in our software systems.
20 years in to my software engineering career, I am still very much a hands on developer, by intention. I feel that I have enough experience with different project types and working environments to have confidence in my intuition about the effect AI tooling will have on the careers of software engineers. This is not to say I am not amazed by the progress of AI, I really am. It already certainly lowers the threshold of experience for engineers to be effective in the basics of a given programming language or environment by facilitating top-down style learning specific to the problem at hand. It’s made me more productive and I use LLMs daily. That said, I am confident that when the AI hype dies down, good software engineers will be as needed as ever.
Music sequencers and synthesizers meant that musicians can create music with the sounds of instruments they do not physically know how to play, but there are no less people making music today than there were before.
This post covers some of the reasons that software engineers really do have a ‘moat’ (a term often used to describe the protecting factors that an AI technology has to protect it from being supplanted by another).
I have reviewed and updated this article following my recent experimentation with newer tools such as Claude Code and Open AI's o3 mini model. Open AI’s GPT-4.5 is being rolled out as I update on 6th March 2025.
The limitations of the AI takeover are similar to the limitations of the low code takeover
Code is at it’s core a precise short hand for expressing a detailed set of instructions about what we want a computer to do. It is the detail and precision that you can express in code that you cannot do with natural language that makes it suitable for writing automation. While LLM’s can code, when we issue instructions via prompts or try to learn about why code is behaving a certain way using an LLM, we use natural language. Natural language can be brief and imprecise, or precise and verbose, but not both at the same time.
Low code is similar in that he instructions are necessarily high level, for at the point where you need to get to low level precise instruction, low code GUI’s quickly become unwieldy. This is fine for chaining common, basic tasks together. But as soon as you need to do something less conventional (e.g. your API fetch code needs to use a specialised authentication mechanism, or stream the results from the server), or you need to optimise for performance, the advantages of low code such as speed to get started, or being novice friendly, fall away.
Either the problem you are working on has a canonical solution that works for everyone in exactly the same way in every code base and every UI framework (in which case there is likely a library or 3rd party solution you should be using) or you will need to specify your own requirements. As you hone your requirements, your prompts will need to become more and more precise. Hopefully you are working on a common problem with a common solution that LLMs have seen in their training data, or are able to reason their way to solving in the ideal way. If they do not however, you’re going to need to tell them what to do, and to do that you’re probably going to need a low level understanding of what it’s done wrong, and for that you will need software engineering experience.
Prompting an AI to write your code is an improvement over low code but eventually runs into similar draw backs in that we have to give up control of at least some of the detail. Asking an AI to write code to perform a well defined task (even if less conventional) often produces impressive results. If however you find the need to iterate over the code to account for edge cases or improve performance, verifying that the absence of side effects that come with the edits the AI make can become laborious. The benefits of using the AI leak away as you find yourself pouring effort into checking more and more lines of code that you didn’t write. It starts to feel like reviewing a pull request, or checking the work of a less experienced developer or one less familiar with your goals.
Some Common AI Issues Do Not Appear to Be Falling Away
AIs currently tend to struggle with writing code that is consistent with the overall architecture of a software application and working on large code bases. They also have a habit of skipping lines of code for ‘brevity’, removing comments and putting files in the wrong place among other bug-bears. LLMs will also sometimes give correct, but inefficient solutions to problems (recently an LLM wrote me some nearly perfect code for generating AWS signatures for a .NET C# project I was working on, but failed to point out the AWS managed NuGet package for doing the same thing perfectly with only a couple of lines of code). Some of the issues will naturally filter out as the tooling around LLM improves, but we will always be left with the fact that we are working with codebases for which some of the detail has been decided for us. We can wrap the code with heavy tests to check the application operates as we want, but we have then either just shifted the effort into test writing or we automate the testing with LLMs and cede that we cannot be certain that the test code is any better than the AI written application code.
Some may argue that giving control of low level detail to AI will eventually be no worse that giving control of low level detail to an employee and better if those employees are not the best coders or decision makers. If all else is equal, the AI is cheaper and will complain less, but at some point someone is going to have to specify, enquire and reason about the low level detail of how your code base is operating, With or without the help of an LLM, that is going to require considerable expertise and human attention.
While you can use an LLM to create small testable units of code, but you will always need a human engineer to put them together in a coherent manner.
AI Will Increase the Amount of Code Written and Therefore the Amount of Code that Requires Maintenance
Jevon’s Paradox tells us that as technological advancements make a resource easier to use (code in this case) the more consumption will rise. This means more code, in the form of a greater number of applications written and larger code bases.
As the amount of software in existence grows, and the the more numerous the details and the greater the effort required in code maintenance and understanding. Maybe AI will help us write it, but how much natural language AI generated documentation would a business analyst or executive need to read to fully understand the workings of a large software application, and how long would that take? At some point the buck stops and someone needs to understand what a system is doing in detail and how to modify it, and the buck has to stop with a human.
Detail is Quality and Quality Compounds
Many engineers know that it takes more skill to make a process simple, than it is to make it complex. Code that deals comprehensively with edge cases helps to make the process for the end user simpler. Often what is apparently a simple process to an end user, has taken many iterations of fixes and improvement along the way.
Japanese cars have long been recognised as more reliable that those made in other countries. That is attributed to Kaizen (Continuous Improvement) and a heavy focus on quality control. If a Japanese car is built of 1000 components with with a failure rate of 0.1% and another car is built of 1000 components with with a failure rate of 0.2%, the other car has a 37% increased chance of failure for the same usage. The point is that detail matters and loosing control of the details means loosing control of quality. When you have a large code base and a reduced number of engineers to work on it, quality will suffer.
Will safety critical code be outsourced to AI’s? I expect that it will be tried if it hasn’t already, but then additional effort will be required to automate the testing of the AI generated code and accountability for failure will never ultimately rest with an LLM.
The Paperclip Maximiser Problem
If you have not put enough guardrails in place with your AI prompts, then you risk LLMs editing your code in unintended ways, pursuing a goal without protection against unintended side effects. This is a hard problem that cannot be fully resolved without enough prompt detail to cover every possible edge case.
The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings were it to be successfully designed to pursue even seemingly harmless goals and the necessity of incorporating machine ethics into artificial intelligence design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips. If such a machine were not programmed to value living beings, given enough power over its environment, it would try to turn all matter in the universe, including living beings, into paperclips or machines that manufacture further paperclips.[6]
Again, code is about detail. Maybe we can create models with core prompts that effectively pre-train the AI to behave as a rational human being would with a full set of ethical human values. But whose ethical values? And could we ever be certain that the set of ethical values is complete enough that we can trust the AI to do the right thing in all situations? Not all code is going to have conditions with the implications of the trolley problem, but the thought experiment demonstrates that many guard rails and somewhat political decisions (which may not be agreed upon by all) will need to be taken before the model can be trusted, and those guard rails will need to be complete - something that many may consider impossible.
The trolley problem is a series of thought experiments in ethics, psychology, and artificial intelligence involving stylized ethical dilemmas of whether to sacrifice one person to save a larger number. The series usually begins with a scenario in which a runaway trolley or train is on course to collide with and kill a number of people (traditionally five) down the track, but a driver or bystander can intervene and divert the vehicle to kill just one person on a different track. Then other variations of the runaway vehicle, and analogous life-and-death dilemmas (medical, judicial, etc.) are posed, each containing the option to either do nothing, in which case several people will be killed, or intervene and sacrifice one initially "safe" person to save the others.
AIs Only Know What They Have Been Told
There is a lot of software in existence that came before the age of AI. Of the projects you’ve worked on, how many had what you would consider ‘full’ documentation. Of the projects that had it, how much of that documentation was up to date, and exactly reflected the working of the code that it is supposed to document?
I’ve worked in companies of all sizes, and I know that the reality is that when I start work on an existing project, documentation that meets all of the above criteria is rare.
Maybe you’d argue that the code is self documenting - more so to an AI with a large enough context to consider the codebase in it’s entirety. But can the intent be fully determined for every line of code be accurately determined? What if that code depends on an external system not in the AI’s current context window.
My point is that an AI cannot know what has not been accurately written down. Software engineers know that for mature projects, most of their time is not spent coding. It’s spent reading documentation, writing documentation, building relationships, weighing decisions and understanding the politics and history around decisions that have been made in writing the code they are looking at among many other ‘soft’ skill related tasks.
Is your AI agent going to be sophisticated enough to make enquires and collect information so that it can reason about the code in the same way that human software engineers do?
Summary
LLMs are important tools. They will likely soon replace traditional search engines (if for no other reason than being are more direct way to find answers with less distraction and advertising) and they will certainly assist in many creative tasks, potentially reducing the need for software engineers for new projects - in the short term at least.
But even if the need for software engineers, AI will continue to contribute to the growth of code bases. That growth will lead to increased complexity. That complexity will eventually need to be managed and importantly - understood by humans, and we need to make sure we retain the expertise to manage it.
This is unless we are willing to cede control of the systems behind all of civilisation to LLMs, and that I do not see as either possible or desirable in the near future.