Incident Response Integration

Informative Appendix (non-normative)

This appendix maps APTS requirements to incident response phases without introducing new requirements. For normative requirements, see domain-specific READMEs. See Cross-Domain Integration Matrix for how events trigger requirements across domains.

This appendix unifies incident response requirements across APTS domains. Incident response capabilities are distributed across multiple domains because different aspects of incident handling fall under different governance concerns. It provides a complete workflow for identifying, responding to, and recovering from incidents during autonomous penetration testing.


Incident Response Requirement Map

This table maps IR phases to the specific requirements that govern each phase, with substantive descriptions of what each requirement demands:

Phase Requirements Domain What This Requirement Demands
Detection APTS-SC-010 SC Platform health monitoring with continuous collection of heartbeat (timestamp, process ID, memory/CPU, active test count, queue depth), resource utilization (CPU, memory, disk, network), process state validation, and behavioral baseline deviation. Anomaly detection triggers alerts based on statistically significant deviations from baseline (probe rates, target-switching frequency, approval escalation frequency, decision latency, action success rates, rollback frequency) with confidence scoring and documented escalation policies.
APTS-HO-015 HO Maintain real-time activity feed of ALL testing actions accessible to operators with multi-channel notifications: dashboard alerts with action summary, email with decision links and approval windows, SMS with delivery confirmation (for high-priority items), and automated calls for CRITICAL alerts. Feed must be accessible in real time without requiring API polling.
Escalation: Unexpected Findings APTS-HO-011 HO Detect and immediately escalate: (1) indicators of external breach or attacker activity, (2) illegal content or evidence of crimes, (3) critical zero-day vulnerabilities with active exploit code, (4) system access outside authorized scope, (5) violations of security policies or compliance frameworks, (6) system stability threats (service crashes, data corruption). Escalation must be human-reviewed before autonomous continuation.
Escalation: Impact Breaches APTS-HO-012 HO Continuously monitor actual testing impact on target systems (availability, resource consumption, data integrity, security events triggered) against documented thresholds. When impact exceeds thresholds, automatically escalate and pause pending human decision. Impact measurement must reflect real system behavior, not predictions.
Escalation: Scope Uncertainty APTS-HO-013 HO When confidence in scope boundary determination or target legitimacy falls below 75%, automatically escalate and pause. Confidence model must be documented. Examples: ambiguous domain ownership, unclear network boundaries, questionable asset classification. Prevents autonomous continuation when scope is uncertain.
Escalation: Legal/Compliance APTS-HO-014 HO Identify and immediately escalate potential violations: (1) GDPR, CCPA, HIPAA, PCI-DSS personal data access, (2) government or classified information exposure, (3) third-party systems or restricted access, (4) regulatory breach notification obligations triggered, (5) IP address ranges owned by unaffiliated organizations.
Containment: Kill Switch APTS-SC-009 SC Provide multiple independent kill switch mechanisms (operator-initiated local halt, remote halt by authorized personnel, automatic failsafe on control plane loss) with two-phase halt: Phase 1 (within 5 seconds) ceases initiating new requests/exploits/actions with in-flight operations continuing; Phase 2 (within 60 seconds) completes in-flight operations gracefully, preserves state, terminates spawned processes, closes connections, flushes logs. Terminate ALL child processes, connections, and agents spawned during testing (not just primary process). Track all spawned processes, network connections, and external agents. Gracefully terminate processes, close connections, stop cloud/distributed agents, revoke temporary credentials, cancel queued actions.
Containment: State Preservation APTS-HO-008 HO Execute state dump capturing system memory, running processes, open connections, pending operations, and platform state (decision queue, active escalations, approval status) before any containment procedures. State must be preserved on independent storage for forensic investigation within the Phase 2 window (within 60 seconds of kill switch activation).
Notification: Operator APTS-HO-015 HO Multi-channel operator notification via dashboard, email (with decision links), SMS (confirmation required), and automated calls (CRITICAL only). Notifications must reach operator within platform's documented response SLA and include incident summary, systems affected, severity, recommended action, decision deadline.
Notification: Stakeholders APTS-HO-017 HO Define and execute stakeholder notification workflows per documented SLAs: (1) engagement status updates (frequency per engagement), (2) finding summaries and severity distribution, (3) critical notifications (CRITICAL findings or incidents within timeframe defined in IR plan), (4) escalation notifications (major thresholds crossed), (5) completion and final report delivery (by documented deadline), (6) post-engagement follow-up. All SLAs and workflows documented before engagement begins.
Notification: External Alerts APTS-SC-017 SC External watchdog (on independent infrastructure) monitors platform health and anomalies. On confirmed incident, watchdog notifies customers within documented SLA via out-of-band channels (phone, SMS, email) independent of compromised platform. Notification includes: what happened, systems affected, automated containment actions taken, next steps, point of contact. Alternative escalation if customer not acknowledging within defined window.
Recovery: Rollback APTS-SC-014 SC Track state for reversible actions (account creation, file modification, config changes, process starts) with: action name, timestamp, target resource ID, pre-action state, rollback procedure, verification method. Persist state after each action. Maintain explicit rollback procedures as executable scripts (no manual steps). Execute rollback and complete within documented maximum rollback time. Trigger alerts if verification fails.
Recovery: Evidence Preservation APTS-SC-016 SC BEFORE rollback begins, capture evidence: screenshots, logs, error messages, modified file contents, database queries, privilege escalation proofs. Store in write-once, tamper-evident storage with read-only access. Persist per engagement's data retention policy. Rollback operations MUST NOT access or modify evidence storage.
Recovery: Cleanup APTS-SC-016 SC Automated cleanup removes all test artifacts (temporary files, created accounts, installed tools, backdoors, test data, activity logs). Procedures MUST be idempotent (safe to run multiple times), atomic (complete or fail, no partial state), and verifiable. Cleanup completes within documented timeframe. Failed cleanup logged and escalated for manual remediation.
Investigation: Root Cause APTS-AL-026 AL Conduct structured investigation: (1) root cause analysis of how incident occurred, (2) impact assessment (systems, data, duration of exposure), (3) review of whether autonomy level was appropriate for the incident that occurred, (4) identification of control improvements, (5) determination of whether autonomy level should be downgraded. Document findings and recommendation.
Investigation: Audit Trail APTS-AR-001 through APTS-AR-012 AR Retrieve and analyze complete audit logs from incident start to end. Audit trail must be immutable, cryptographically signed, append-only with sufficient detail to reconstruct all decisions, escalations, approvals, and platform actions. Use audit data to validate platform-reported timeline and confirm investigation findings.
Investigation: Evidence Chain APTS-RP-001 RP Extract and validate evidence from findings discovered during incident. Evidence must include raw technical artifacts (packets, logs, command output), cryptographic hashes, timestamps. Validate evidence chain to confirm findings accurately reflect what platform discovered and how decisions were made.
External Response: Provider APTS-TP-005 TP If incident involves provider compromise: (1) assess data exposure (what data was on provider systems), (2) notify provider of compromise, (3) assess engagement continuation feasibility, (4) customer notification per documented incident response plan. Procedures must enable rapid containment of exposure and assessment of whether testing can continue.
External Response: Tenant Breach APTS-TP-018 TP If multi-tenant isolation breached (engagement data leaked to other engagement): (1) detect via monitoring of cross-engagement access attempts and audit log analysis, (2) immediately isolate affected systems, (3) assess scope (engagements, data categories, record quantities), (4) notify all affected customers promptly per incident response plan including what data leaked, time window, remediation taken.
Reporting: Incident Report APTS-AL-018, APTS-RP-011 AL, RP Generate incident report for customer including: incident timeline (discovery, occurrence, detection, containment, recovery), root cause, affected systems and data, impact assessment, containment actions taken, recovery procedures executed, lessons learned, preventive measures implemented, evidence of re-verified safety controls. Deliver within timeframe defined in IR plan. The platform must require customer acknowledgment and approval before testing resumes.

Notification Types:

Organizations SHOULD send preliminary notifications promptly rather than delaying to gather complete information.


Unified Incident Severity Classification

A recommended incident severity model that maps across domains. Severity determines response urgency and notification channels:

SEV-1 (Critical)

Triggers:

Required Actions:

Notifications:

Impact on Testing:


SEV-2 (High)

Triggers:

Required Actions:

Notifications:

Impact on Testing:


SEV-3 (Medium)

Triggers:

Required Actions:

Notifications:

Impact on Testing:


SEV-4 (Low)

Triggers:

Required Actions:

Notifications:

Impact on Testing:


Incident Response Workflow

This section walks through the complete sequence of incident response and the APTS requirements that govern each phase:

Phase 1: Detection (0-5 minutes)

What triggers detection:

Requirements that apply:

Operator actions:


Phase 2: Escalation & Decision (0-30 minutes depending on severity)

Escalation triggers (APTS-HO-011 through APTS-HO-014):

  1. Unexpected findings escalation (APTS-HO-011): Detects external breach indicators, illegal content, critical zero-days, out-of-scope access, compliance violations, system stability threats. Halts autonomous operations for human review.
  2. Impact breach escalation (APTS-HO-012): Continuously monitors impact and escalates when thresholds exceeded. Pause triggers automatic operator notification and escalation.
  3. Scope uncertainty escalation (APTS-HO-013): Confidence falls below 75%; escalates with human authorization required to resume.
  4. Legal/compliance escalation (APTS-HO-014): Detects GDPR/CCPA/HIPAA data access, government info, third-party systems, breach notification obligations, suspicious IP ranges. Escalates for legal review.

Requirements that apply:

Operator decision:


Phase 3: Containment (0-5 seconds for Phase 1; 0-60 seconds total for Phase 2)

Kill switch requirements (APTS-SC-009):

Phase 1 (within 5 seconds) - Safety-Critical Cessation:

Phase 2 (within 60 seconds total) - Graceful Shutdown:

State preservation (APTS-HO-008):

Requirements that apply:


Phase 4: Notification (0-1 hour depending on severity)

Operator notification (APTS-HO-015):

Stakeholder/Customer notification (APTS-HO-017, APTS-SC-017):

External watchdog notification (APTS-SC-017):

Regulatory notification (APTS-TP-A01 if breach, Advisory):

Requirements that apply:


Phase 5: Recovery (1-24 hours depending on severity)

Evidence preservation (APTS-SC-016, APTS-RP-001):

Rollback (APTS-SC-014):

Automated cleanup (APTS-SC-016):

Requirements that apply:


Phase 6: Investigation (4-48 hours)

Root cause analysis (APTS-AL-026):

Audit trail analysis (APTS-AR-001 through APTS-AR-012):

Evidence validation (APTS-RP-001 through APTS-RP-004):

Requirements that apply:


Phase 7: Post-Incident (1-5 business days)

Safety control re-verification (APTS-SC-018 for platform incidents):

Post-incident report (APTS-AL-018, APTS-RP-011):

Customer notification and approval:

Requirements that apply:


Common Incident Scenarios

These scenarios illustrate how APTS requirements activate in realistic situations:

Scenario 1: Platform Causes Service Outage

Incident: Platform's aggressive testing causes target application response time to exceed 500% of baseline; service becomes slow and some users experience timeouts.

Detection (5 mins):

Escalation (5-10 mins):

Operator Decision (10-15 mins):

Recovery (15-30 mins):

Investigation (4 hours):

Post-Incident (24 hours):


Scenario 2: Cross-Tenant Data Leakage Detected

Incident: Operator notices that Engagement A's shell access credentials are appearing in Engagement B's logs (isolation breach).

Detection (5 mins):

Escalation & Containment (3-30 mins):

Notification (1 hour):

Investigation (4-24 hours):

Recovery (4-48 hours):

Compliance (24-72 hours):

Post-Incident (5 business days):


Scenario 3: AI Model Drift Detected Mid-Engagement

Incident: Platform's decision-making pattern anomaly detected: escalation frequency drops 40% despite constant input profiles. Investigation reveals AI model version was auto-updated by provider, changing decision behavior.

Detection (15 mins):

Escalation (15-30 mins):

Investigation (1-2 hours):

Recovery & Remediation (1-4 hours):

Customer Notification (1-2 hours):

Post-Incident (4-8 hours):


Scenario 4: Platform Discovers Illegal Content on Target

Incident: During web application testing, platform's file enumeration discovers child sexual abuse material (CSAM) on a web server.

Detection (5 mins):

Immediate Response (5-10 mins):

Escalation & Decision (10-30 mins):

Evidence Preservation (immediately):

Regulatory Notification (within 1-4 hours per jurisdiction):

Investigation & Post-Incident (4-24 hours):


Scenario 5: Operator Credentials Compromised

Incident: Operator's API key used to approve unauthorized testing action on unrelated customer's system. External watchdog detects and flags divergence between operator's typical approval patterns and this action.

Detection (10 mins):

Immediate Response (10-30 mins):

Containment (30 mins - 2 hours):

Investigation (2-24 hours):

Recovery & Prevention (4-24 hours):

Customer Notification (1-4 hours):


Cross-Domain Consistency

When implementing incident response, the escalation paths, containment mechanisms, and notification workflows must operate as a coherent sequence rather than independent processes. This section describes the integration points:

Escalation-to-Containment Integration

Notification-to-Escalation Integration

Investigation-to-Recovery Integration

Recovery-to-Resumption Integration


Reference to Cross-Domain Integration

For how requirements interact across all domains (not just incident response), see Cross-Domain Integration Matrix. That document maps requirements to their dependencies and integration points across Safety Controls, Human Oversight, Graduated Autonomy, Auditability, Supply Chain Trust, and Reporting domains.