What LLMs Can't Guess
Posted on 22 March 2026 by Mirko Janssen — 6 min

In my daily work I use coding agents and other AI tools to help me. This is nothing special anymore, many developers have made the switch. But when I hear colleagues or friends talk about the results they get from their AI tools, I sometimes notice a difference. For me, I generally get reliable and good results from these tools, which does not seem to be the case for everyone. On the surface, what I notice is the quality of the prompts and the specifications. That is not really news either, there are dozens of articles explaining how important specifications are and how to get better at writing them. But that is not what I want to focus on here. Instead I want to talk about something I came across in an article by Bob Wyman.
The Problem with Specifications
In his article, Bob Wyman describes that the real problem is not the imprecision of natural language, but that it allows imprecision without making it visible. An LLM fills the gaps in a specification quietly and confidently, but it does not care about the right answer, only the statistically likely one. If we ask an LLM to extend an online shop so that customers can get a discount with a voucher, and on top of that ask it to cover the functionality with tests, the result will probably look plausible and work at first. But what happens if my vouchers bring the total below zero? What is fundamentally missing in this approach is a mechanism that makes correctness an explicit part of the process.
That is exactly what tests are for, but in my example the LLM wrote tests too, right? The difference is in the type of test. A single unit test checks one path with concrete inputs and a defined expected output. What it does not check is whether the implementation actually does what it is supposed to do! What we need are tests that describe properties: invariants that must hold for a whole class of inputs and do not depend on what an LLM happened to generate. This is exactly where property-based testing comes in, because instead of checking whether f(2) == 4, you check whether f(a) + f(b) == f(a + b) holds for any a and b. Such tests are harder to write, but they force us to specify the actual behavior instead of just documenting an example. For AI-generated code this is especially valuable, because the quiet wrong decisions of an LLM become visible.
Property-Based Testing
In my article about Test-Driven Development I wrote that it is important to focus on the behavior of a method and not just on input and output. I want to do the same thing here. Going back to the voucher example from earlier, these would be the requirements:
- A voucher reduces the total by a fixed amount, but never below zero.
- A voucher can only be redeemed once per order (or at all).
- Multiple vouchers combined must not give more discount than the total amount.
These are not unit tests, they are properties that we should have given the LLM before it generated any code. Behavior-Driven Development, which I also mentioned in that earlier post, does not help here either. BDD is about describing behavior in a language that both developers and non-developers can understand. But a test in BDD still documents a concrete example, not a generally valid property.
When we look at property-based testing, we see that it flips the process around. Instead of an expected result, we describe the rules that must hold across all test runs. Used together with a test framework like fast-check (for TypeScript), many random inputs are generated and checked automatically. On one hand I can stop worrying about edge cases, and on the other hand it forces me to formulate the behavior precisely. This precision is what helps the LLM make decisions based on concrete requirements rather than patterns from its training data.
How to Spot Properties
At first it really does take some practice to spot properties. There are known patterns that help, like symmetry, idempotency, or monotonicity. But a checklist helps less here than a simple habit. When writing code or doing code reviews, it is worth pausing and asking yourself: is there something here that must never change? If I change the input, what changes about the result? Is there an operation here that can be undone?
For the voucher example, the answers are clear:
- The total must never become negative.
- More discount or additional vouchers must not increase the amount.
- Redeeming and then cancelling a voucher must restore the original amount.
The nice thing about this: these are questions you should be asking anyway before you give an LLM the implementation task. Anyone who does this ends up with not only better tests but also better specifications.
Where Else to Find Properties
One more thing worth keeping in mind is that in existing projects you are not starting from zero. The properties are already there, you just need to know where to look, for example in the language a team uses. If you practice Domain-Driven Design, the properties are already visible through the business rules. The same goes for decisions that have been written down in architecture decision records or similar documents.
Lessons Learned
Poor or insufficient specifications have always been a problem in software development. But LLMs hide the problem: the code comes faster, looks more convincing, and the gaps in the specification are filled quietly rather than flagged loudly. That is the real danger! Property-based testing is nothing new, but it is a big help when you are delegating implementations. A developer who formulates properties does two things at once: they write better tests and they force themselves to think more precisely.
The three questions from the voucher example cost almost no time, but they make the difference between a specification that an LLM interprets and one that it no longer needs to interpret. The more you rely on these tools, the more important this discipline becomes. Not because LLMs are bad or stupid, but because the responsibility for the result stays with the developer, no matter how well tasks can be delegated.