EU AI Act red-teaming requirements 2026 — what Article 15 + 55 actually require

The EU AI Act introduced "red-teaming" as a binding obligation for high-risk AI systems and general-purpose AI models with systemic risk. The Commission's implementing regulations are still in consultation; the operational reality for EU organisations placing AI into Annex III categories is already shaping up. This guide covers scope, methodology, documentation, and how it intersects with existing NIS2 + DORA obligations.

Who's in scope

Two overlapping populations:

  • High-risk AI systems (Annex III, Articles 6–7 + Annex III): biometric ID, critical infrastructure, education/vocational training, employment + worker management, essential private + public services (incl. credit scoring + insurance pricing), law enforcement, migration/asylum/border, administration of justice + democratic processes.
  • General-purpose AI models with systemic risk (Article 51 + 55): models meeting the compute threshold (≥10^25 FLOPs of training compute) or Commission-designated. As of 2026, this list is small but growing.

The vast majority of EU SMEs placing AI into one of the Annex III high-risk categories fall under the first population. The systemic-risk-GPAI obligation is mostly relevant to model providers (OpenAI, Anthropic, Mistral, Aleph Alpha, etc.) not deployers.

Article 15 — accuracy, robustness, cybersecurity

For high-risk AI systems (Annex III), Article 15 requires:

  • Accuracy — system performs at declared accuracy levels across the lifecycle. Declared in technical documentation (Article 11 + Annex IV).
  • Robustness — system is resilient to errors, faults, inconsistencies. Includes fail-safe mechanisms + back-up plans.
  • Cybersecurity — protection against unauthorised third parties altering use, outputs, or performance. Includes:
    • Resilience to attempts to alter outputs via adversarial inputs
    • Protection against data poisoning, model poisoning, model evasion
    • Confidentiality of model weights + training data where required
    • Penetration testing relevant to the AI-specific attack surface

The implementing technical standards are still in draft; the Commission released a consultation draft Q2 2026. Operational guidance: read Article 15 as a "you must red-team the AI-specific attack surface" obligation in addition to standard application-layer pentesting.

Article 55 — GPAI with systemic risk

For general-purpose AI models with systemic risk (Article 51 designation), Article 55 requires:

  • Adversarial testing ("model evaluation") — including red-teaming — to identify + mitigate systemic risks. Documented + reported to the AI Office.
  • Cybersecurity measures — adequate cybersecurity for the model + physical infrastructure.
  • Serious incident reporting — without undue delay to the AI Office + relevant national authorities.

Article 55 applies to model providers, not deployers. If your organisation is deploying GPT-5, Claude Sonnet 4.6, or similar — the model provider is the Article 55 entity. Your obligation is Article 15 (if your deployment is high-risk under Annex III) plus the provider-deployer agreement (Article 25).

What "red-teaming" actually means here

The EU AI Act uses "red-teaming" in a specific sense, different from offensive-security red-teaming:

  • Adversarial inputs — crafted prompts (LLMs), edge-case features (classifiers), poisoning attempts (training-time attacks).
  • Model-specific attacks — prompt injection, jailbreaks, data extraction, membership inference, model inversion, evasion via gradient-based perturbations.
  • Systemic-risk testing (Art. 55 only) — testing the model's ability to enable CBRN harm, large-scale disinformation, election interference, etc. Specialist work — typically requires dedicated AI-safety teams.

Standard application-layer penetration testing (web app + API + auth + business logic) covers the surrounding infrastructure but not the AI-specific attack surface. The two are complementary; neither replaces the other.

Documentation requirements

For Annex III high-risk systems, the technical documentation file (Article 11 + Annex IV) must include:

  • Red-teaming methodology + test scope (per evaluation cycle)
  • Adversarial inputs attempted (categories, not necessarily exhaustive payloads)
  • Mitigation evidence (input validation, output filtering, monitoring, model retraining)
  • Residual risk acceptance + sign-off
  • Continuous monitoring plan for model drift + emerging adversarial-input categories

Stored for 10 years post-market-placement (Article 18). Made available to national authorities on request.

Intersection with NIS2, DORA, ISO 42001

  • NIS2: if your organisation also falls under NIS2 essential/important entity scope, Article 21(2)(e) (vulnerability handling) covers the surrounding infrastructure for the AI system. The AI Act adds AI-specific obligations on top.
  • DORA: financial entities using AI for credit scoring, insurance pricing, fraud detection (all Annex III high-risk) face DORA Article 24 obligations for the surrounding ICT infrastructure + AI Act Article 15 for the model itself.
  • ISO/IEC 42001 (AI Management System): the certifiable management-system standard for responsible AI. Not legally binding under the AI Act but increasingly cited as evidence of "appropriate measures" — the management-system equivalent of ISO 27001 for AI.
  • NIST AI Risk Management Framework: US framework; not legally binding in EU but the test-methodology language is similar and the technical mappings translate.

Consultation + enforcement timeline

  • 2024-08-01: AI Act entered into force (Regulation (EU) 2024/1689).
  • 2025-02-02: prohibitions on unacceptable-risk AI systems applicable (Article 5).
  • 2025-08-02: obligations on GPAI providers applicable (Article 53, 55).
  • 2026-08-02: obligations on high-risk AI systems applicable to most existing systems (Article 6 + Annex III) — date triggers Article 15 enforcement.
  • 2027-08-02: obligations on high-risk AI systems in regulated products (Article 6(1) + Annex I).
  • Q3 2026: Commission's Article 15 implementing regulation expected (currently in consultation as of Q2 2026).

The window between today and August 2026 is the operational ramp for most EU organisations placing AI into Annex III categories. Documentation + red-teaming methodology should be in place before placement, not retro-fitted post-enforcement.

How SQUR fits — and where it doesn't

Where SQUR helps: the application-layer attack surface surrounding the AI system. Web app + API + authentication + business logic for the deployment infrastructure. Direct DORA Article 24 + NIS2 Article 21(2)(e) coverage for the non-AI surface. Pre-mapped audit-ready report.

Where SQUR is out of scope: AI-model-specific red-teaming (adversarial inputs, prompt injection, model evasion). For these, work with AI-safety specialists. The two are complementary; the SQUR pentest covers the application + infrastructure, the AI-safety engagement covers the model.

For EU organisations deploying high-risk AI under Annex III: run both. Start with the SQUR free attack-surface scan to characterise the deployment infrastructure, then layer in AI-specific red-teaming via a specialist firm.

Free attack-surface scan → NIS2 Article 21 evidence guide