We spent last week at Infosecurity Europe in London, and one phrase followed us from stand to stand: signal versus noise. The argument against autonomous pentesting was remarkably consistent. "It just adds noise. You will drown in findings you cannot prioritise." It is a good objection, and it is the one we hear most often. It deserves a real answer, not a slogan.

Here is the short version. That objection is correct, but only about a category of tooling that autonomous pentesting is not. Signal versus noise is a scanner problem. A proven exploit cannot be noise. And the one honest exception, the real vulnerability a business chooses not to fix, is not noise either. It is accepted risk, and the gap between those two words is the entire argument.

First, the part the critics get right

Start by conceding the strong version of their case, because the data backs it.

The raw volume is staggering. In 2024 the CVE programme published more than 40,000 new vulnerabilities, a record, and roughly 38% more than the year before. Nobody remediates 40,000 of anything.

Almost none of it gets used. Across years of data spanning billions of vulnerability observations, the Cyentia Institute and Kenna Security found that only around 5% of known vulnerabilities are ever exploited in the wild. CISA's Known Exploited Vulnerabilities catalogue, the list of bugs confirmed in real-world attacks, holds on the order of a thousand entries against more than 250,000 catalogued CVEs. The signal is a rounding error on the noise.

The severity labels do not rescue you. Close to half of all published CVEs carry a CVSS score of 7.0 or higher, which is to say "High" or "Critical". When that much of everything is rated urgent, the label stops sorting anything. Cyentia's research is blunt about it: prioritising by real exploit evidence is roughly eleven times more effective than prioritising by CVSS score, which on its own is little better than working through the list at random.

And you cannot out-work the pile. The same body of research shows most organisations remediate only about one in ten of their open vulnerabilities in a given month, while new ones arrive faster than old ones close. A meaningful share of vulnerabilities are still open a year after they were discovered. The backlog is not a queue you drain. It is a tide.

This is the world the critics are describing, and they are right about it. Static and dynamic scanners, software composition analysis, raw CVE feeds: these tools exist to detect, and detection at scale produces exactly this. Practitioner evaluations of static analysis tooling frequently report false-positive rates running from a third to half of all alerts, and security teams spend a large slice of every week chasing alarms that turn out to be nothing. That is alert fatigue, and it is a real, measurable tax on a real team.

If your security programme is a detection programme, signal versus noise is the defining problem of your job. The critics are not wrong. They are describing the wrong tool.

What "noise" actually means

Here is the move the objection skips. Noise is not a property of the vulnerability. It is a property of the tool that reported it.

A scanner matches a pattern: a version string, a response header, a signature. Then it emits a finding that says, in effect, "this might be exploitable". The word that carries all the weight is might. The scanner does not know. It cannot tell you whether the injection actually fires, whether a control compensates, whether the path is even reachable, or whether the bug sits behind authentication it never had. It hands you the doubt and calls it a finding.

That unresolved doubt is the noise. Every "might" is a small debt that someone on your team repays by hand: read the finding, reproduce it, decide if it is real, decide if it matters. Multiply by tens of thousands and you have described modern vulnerability management.

So noise is structural. It is what you get when a tool's job ends at "I found a pattern that sometimes indicates a problem". Detection tools live there by design. The noise is not a defect in the scanner. It is the scanner doing precisely what a scanner does.

A proven exploit cannot be a false positive

Offensive security does a different job. It does not detect a pattern. It performs the attack.

When an autonomous pentest reports a finding, it is not saying "this might be exploitable". It is saying "we exploited it, here is the request, here is the response, here is the data we reached, reproduce it yourself". The doubt has already been spent, at the source, before the finding ever reaches you. There is no "might" left to triage.

This is why the category error matters so much. There is no such thing as a noisy exploit. A working proof of exploitation is, by definition, a true positive. It carries its own evidence. It either reproduces or it does not, and if it reproduces, the question "is this real" is already answered and cannot be reopened. The verification is not a confidence score bolted on afterwards. It is the act of exploiting the thing.

Put the two worlds side by side and the difference is not "fewer false positives". It is a different shape of output entirely.

Detection (scanners, SAST, CVE feeds) Proof (real exploitation)
What a finding claims "This might be exploitable" "We exploited this, here is the evidence"
Who resolves the doubt Your team, by hand, one finding at a time Already resolved, at the source
Possible end states Fix it / Accept it / Don't know Fix it / Accept it
Where the noise lives The "don't know" pile There is no "don't know" pile

"But we ignore plenty of real vulnerabilities"

This is the strongest counter-argument, and it deserves to be taken seriously rather than dodged. A security lead will say: even if every finding is proven, we still will not fix all of them. Some sit on a low-value asset. Some touch data we do not care about. We will leave them. So how is a pile of proven-but-ignored findings any better than a pile of unproven ones?

Because ignoring a proven finding has a name, and the name is not noise. It is accepted risk.

The distinction is not pedantic. It is the difference between a tooling failure and a governance decision:

  • Noise is the tool failing to tell you whether something is real. The doubt is involuntary. You did not choose it, you inherited it, and you carry it in the dark.
  • Accepted risk is you, in full knowledge that something is real and exploitable, making a documented decision that the cost of fixing it outweighs the consequence of leaving it. The doubt is gone. What remains is a choice.

Accepted risk is not a problem to be eliminated. It is the entire point of a risk-management programme. ISO 27001 has a formal risk-acceptance process for exactly this. DORA and NIS2 expect you to show that you understood your exposures and made deliberate decisions about them. An auditor does not want a clean scan. An auditor wants evidence that you knew what was real and decided about it on purpose.

You cannot accept a risk you cannot see clearly. Accepting a "might be exploitable" is not risk acceptance, it is guessing. Accepting a proven, reproduced exploit on a low-value asset is a defensible business decision you can write down, sign, and put in front of an auditor. Proof is what turns the things you choose not to fix from a quiet liability into a documented record.

Noise is "we don't know". Accepted risk is "we know, and we decided". Proof is what moves a finding from the first sentence to the second.

How to tell whether you are buying signal or noise

If signal versus noise is a scanner problem, the practical question for any buyer is simple: is the tool in front of me detecting, or proving? Worth noting, plenty of the vendors warning loudest about noise are themselves selling detection dressed as something more, because in 2026 every scanner ships with an "AI" badge and an "autonomous" tagline. The marketing will not tell you which one you are looking at. The behaviour will. A few questions cut through it fast:

  • Does every finding come with a working proof? A real exploitation result includes the payload, the request, and evidence you can reproduce yourself. "High: SQL injection possible" with nothing attached is a detection wearing a costume.
  • Does it chain, or stop at the front door? Detection finds one issue and moves on. Exploitation uses the first foothold to reach the next, the way an attacker does.
  • Does it test as a logged-in user? Most real risk lives behind authentication. A tool that only looks from the outside is, by construction, ignoring most of your attack surface.
  • Are findings verified before you see them, or are you the verifier? If the triage burden lands on your team, you bought a detector, and you bought its noise along with it.

We wrote a longer field guide to this question: How to Spot a Scanner Posing as an AI Pentest Tool. The one-line version: if the burden of deciding what is real lands on you, you are holding a scanner, whatever the label says.

What this means in practice

SQUR was built on this distinction. An autonomous pentest does not hand you a list of maybes. Nine specialised agents discover and exploit, then a separate agent independently re-tests every candidate finding before it appears in a report, and a deduplication step merges findings that describe the same underlying issue. What lands in front of you is a list of confirmed, distinct, reproducible exploits, each enriched with CWE, CVE, CAPEC and MITRE ATT&CK references and a clear risk level, not a CVSS pile to re-triage from scratch.

On an independent pentest benchmark of 104 challenges, the platform solved 91, a success rate of 87.5%, ahead of the top human pentester at 85%, with a 100% result on access-control flaws, SQL injection, SSRF, XXE, GraphQL and business-logic abuse. Those are the categories where proof matters most, and where detection-only tools generate the most noise. A full engagement runs in 24 hours for a fixed EUR 1,995, with every result kept in the EU (GCP Brussels).

The promise is not "fewer false positives". It is no false positives by construction, because a finding only exists once it has been exploited. Everything you receive already sits in one of the two states an auditor recognises: fix it, or formally accept it. The third state, the "we don't know" pile where noise lives, never gets created.

The pentest vendors warning you about noise are right that most security tooling drowns teams in maybes. They have simply mistaken the cure for the disease. The answer to noise was never a smarter way to sort detections. It was to stop shipping doubt and start shipping proof.

Want proof instead of maybes? Start a pentest or book a demo, and we will run it live against your application. No slides.

Sources: CVE Program / NVD (annual CVE volume); Cyentia Institute and Kenna Security, Prioritization to Prediction series (exploitation rates, CVSS as a predictor, remediation capacity); CISA Known Exploited Vulnerabilities catalogue; FIRST.org Exploit Prediction Scoring System (EPSS). SQUR benchmark figures from our independent pentest benchmark report.