Methodology

How SQUR actually works.

Traditional penetration testing takes three weeks, costs €20,000, and ships a PDF where half the findings are false positives. SQUR runs an autonomous pentest in 24 hours for €1,995. This page explains how.

Independent benchmark: SQUR solved 91 of 104 challenges (87.5%), exceeding the best reported human score of 85% on the same benchmark. Read the full benchmark write-up →

The seven-stage pipeline

Every SQUR pentest walks the same seven stages. The interesting work is in stages 3 and 5 — the planner and the validator. The rest is structured execution.

Stage 1 · Scope

Target intake

A target URL, optional credentials for each user role, an objective, and the in/out-of-scope path set. Three-minute web form. The scoping artefact is signed and pinned to the scan record — every test has a non-repudiable scope.

Stage 2 · Recon

Surface mapping

Subdomain enumeration via certificate-transparency logs, passive DNS, courteous port scanning, TLS inventory, HTTP fingerprinting, headless-Chrome crawling for JS-rendered routes, and API discovery via OpenAPI/GraphQL where exposed. Tooling includes nmap, subfinder, httpx, katana, amass, gobuster, ffuf, waybackurls, and nikto.

Stage 3 · Plan

Exploitation planner

An LLM-driven planner reads the surface map and emits an ordered attack plan — structured test cases per route, parameter, and auth boundary, scored by expected information gain. The planner doesn't run exploits directly; it produces a reviewable plan. Every scan's attack-plan.json is available to the customer post-scan.

Stage 4 · Execute

Structured execution

The execution engine consumes the plan and runs test cases with full HTTP instrumentation, rate-awareness, and bounded concurrency. Tooling layer includes sqlmap, nuclei, XSStrike, Metasploit, OWASP ZAP, wfuzz, hydra, and Interactsh for out-of-band confirmation.

Stage 5 · Validate

AI-validated exploitation

This stage is where SQUR diverges from traditional vulnerability scanning. For every candidate finding, the validator constructs a follow-up exploit and verifies the impact — extracted SQL data, confirmed BOLA access to another user's resource, observed SSRF callback, or parsed XXE entity. If exploitation can't be demonstrated, the finding doesn't ship. The result is verified-exploitable findings with concrete proof attached.

Stage 6 · Classify

Finding classification

Each confirmed finding gets an OWASP class + CWE tag, a CVSS v4.0 base score, the exploitability proof (request/response pair plus validator narration), framework-aware reproduction steps, fix guidance, and a compliance mapping to DORA / NIS2 / ISO 27001 Annex A / GDPR.

Stage 7 · Report

Report & re-verify

The report ships as interactive HTML plus printable PDF. It contains the attack plan, the verified findings with proofs, fix guidance, compliance evidence, and a scope-exhaustion appendix. After fixes deploy, you re-run the scan and findings either close or stay open. Continuous-coverage tiers run scheduled scans automatically.

The validator in depth

A vulnerability scanner says “Possible SQL injection at /api/users.” The SQUR validator either says “Confirmed SQL injection at /api/users — extracted MySQL 8.0.32 via UNION select” with the request/response evidence attached, or it doesn't include the finding at all.

Design constraints

The validator cannot fabricate findings — only confirm or reject what the executor proposed.
Confirmation requires concrete evidence (request + response + observed behaviour) reproducible by a human.
The validator is sandboxed: outbound HTTP only to the test target plus SQUR-controlled out-of-band hosts.
Validator prompts + tool definitions are versioned. We replay findings against prior versions to confirm regression-free behaviour.

Where the validator can be wrong

False negatives: the validator may miss findings that require multi-step chained exploitation. Mitigation: per-class chained-payload templates, expanded each release.
Context misjudgment: the validator may misclassify a finding's severity (it confirms exploitability but mis-scopes blast radius). Mitigation: the report exposes the validator's narration, so engineering can re-score with their context.

The benchmark: 91 of 104

On an independent pentest benchmark of 104 challenges — the same set used to publish reported human-pentester results — SQUR found 91 of 104 (87.5%), exceeding the best reported human score of 85% on the same benchmark.

The benchmark is a useful indicator of exploitation capability. It's not a substitute for production pentesting — the latter requires broader vulnerability-class coverage, strict guardrails, and remediation-ready proof-of-concept for every finding. We treat the benchmark result as a signal, not a substitute.

Full write-up: SQUR beats humans on the independent pentest benchmark →

Real findings

Two case studies show what an actual SQUR finding looks like — from initial signal through validator confirmation to remediation.

Case study

Mass assignment with self-verification

How SQUR found a mass-assignment vulnerability allowing privilege escalation — and how the validator self-confirmed the exploit before reporting it. Read the walkthrough →

Case study

False positive verification

How the validator suppresses noise that traditional scanners would have shipped — with a worked example of the validator rejecting an apparent finding after attempted exploitation failed. Read the walkthrough →

Data residency & compliance

Where scans run: GCP europe-west1 (Brussels) and europe-west3 (Frankfurt).
Where reports live: Firestore in europe-west3, encrypted at rest.
PII: scan target URLs and auth tokens encrypted in Secret Manager with customer-rotatable keys.
Retention: scan artefacts expire 90 days after delivery unless extended by customer.
We do not train models on customer scan data. We do not share customer data with research partners without explicit opt-in.

For DORA, NIS2, ISO 27001 Annex A, GDPR, and EU AI Act mapping details, see the Trust Center.

Where we're less good (yet)

SQUR does web and API pentesting well, with OWASP Top-10 verified-exploit coverage, compliance evidence, and continuous re-scanning. SQUR does less well today on:

Multi-step business-logic exploits requiring 3+ chained requests with stateful tokens — coverage is improving but not yet at human-expert level for novel logic flaws.
Bespoke target types needing pre-engagement context (industrial-control systems, hardware-adjacent stacks, custom binary protocols).
Adversarial red-team objectives (objective-driven simulation rather than coverage-driven testing).
Targets behind authenticated proxies / VPNs / customer-internal gateways — these need scoping conversations rather than off-the-shelf intake.

If your target is one of these, we'll tell you on the demo call. We have referral arrangements with specialist firms for cases SQUR isn't the right fit for.

Try it

A free attack-surface scan runs in 60 seconds and produces a same-day report. No signup, no credit card. A full SQUR pentest runs in 24 hours for €1,995. EU data residency.

Run a free scan Book a 30-min walkthrough