Graduated Autonomy: Implementation Guide

Practical guidance for implementing APTS Graduated Autonomy requirements. Each section provides a brief implementation approach, key considerations, and common pitfalls.

Note: This guide is informative, not normative. Recommended defaults and example values are suggested starting points; the Graduated Autonomy README contains the authoritative requirements. Where this guide and the README differ, the README governs.


APTS-AL-001: Single Technique Execution

Implementation: Implement an execution model that isolates each technique invocation. Enforce a one-technique-per-operation constraint at the API and scheduling layer. No automatic chaining or result-driven sequencing.

Key Considerations:

Common Pitfalls:


APTS-AL-002: Human-Directed Target and Technique Selection

Implementation: Remove all heuristic or automated selection logic. All targeting and technique choices originate from explicit human operator commands. Validate that selections match authorized scope.

Key Considerations:

Common Pitfalls:


APTS-AL-003: Parameter Configuration by Human Operator

Implementation: Require explicit human configuration of all technique parameters before execution. Never apply tool defaults or infer parameters from context. Surface all configurable options with clear explanations.

Key Considerations:

Common Pitfalls:


APTS-AL-004: No Automated Chaining or Sequential Decision-Making

Implementation: Disable workflow automation and decision trees at Tier 1. Require explicit human command for every subsequent action. Build enforcement at the orchestration layer, not just by guideline.

Key Considerations:

Common Pitfalls:


APTS-AL-005: Mandatory Logging and Human-Reviewable Audit Trail

Implementation: Capture all actions, parameters, results, operator decisions, and timestamps. Ensure logs are human-readable and forensically complete. Include operator identity, authorization scope, and approval status for each action.

Key Considerations:

Common Pitfalls:


APTS-AL-006: Basic Scope Validation and Policy Enforcement

Implementation: Validate all targets, techniques, and parameters against explicit scope rules before execution. Maintain a scope authorization document (whitelist or CIDR blocks). Block any execution outside authorized scope.

Key Considerations:

Common Pitfalls:


APTS-AL-007: Multi-Step Technique Chaining Within Single Phase

Implementation: At Tier 2, permit chaining of multiple techniques within a single phase (for example, multiple recon steps, or multiple exploitation steps on a confirmed target) without per-action human approval, provided all belong to the same pre-defined phase.

Key Considerations:

Common Pitfalls:

L2/L3 Architecture Pattern:

L2 and L3 autonomy require a state machine governing phase transitions and action authorization:

States: IDLE → RECONNAISSANCE → EXPLOITATION → POST_EXPLOITATION → REPORTING → CLEANUP
Transitions:
  IDLE → RECONNAISSANCE: requires operator_start_command AND scope_validated
  RECONNAISSANCE → EXPLOITATION: requires (discovery_complete OR operator_override) AND operator_approval(L2) OR boundary_check_pass(L3)
  EXPLOITATION → POST_EXPLOITATION: requires exploitation_objectives_met AND operator_review(L2) OR auto_transition(L3)
  POST_EXPLOITATION → REPORTING: requires post_exploitation_complete AND evidence_preserved
  REPORTING → CLEANUP: requires report_generated AND rollback_initiated
  ANY → IDLE: kill_switch_activated (unconditional, highest priority)

Each transition MUST be logged per APTS-AR-001 with: previous state, new state, transition trigger, operator ID (if applicable), and timestamp. At L2, operator approval is required for any transition into EXPLOITATION. At L3, the platform may auto-transition if pre-approved boundary checks pass, but MUST log the decision rationale.


APTS-AL-008: Real-Time Human Monitoring and Approval Gates

Implementation: Provide a real-time dashboard showing active tests, current technique execution, findings, and escalation alerts. Implement approval gates for high-risk actions (exploitation, lateral movement) with operator override capability.

Key Considerations:

Common Pitfalls:


APTS-AL-009: Tool-Proposed Actions with Operator Modification Capability

Implementation: At Tier 2, permit the tool to propose the next action(s) based on findings, but require operator approval with modification capability. Operator can reject, edit parameters, or substitute a different action.

Key Considerations:

Common Pitfalls:


APTS-AL-010: Step-by-Step Audit Log with Phase Transitions

Implementation: Enhance the Tier 1 audit trail to capture each step explicitly, including operator decisions at phase transitions. Record when operator approves phase entry, chaining parameters, and any modifications to tool proposals.

Key Considerations:

Common Pitfalls:


APTS-AL-011: Escalation Triggers and Exception Handling

Implementation: Define explicit escalation triggers for scope violations, unplanned exploitation (crashes, denial of service), anomalies, and policy breaches. Automatically pause execution and alert on any trigger. Require operator decision before resuming.

Key Considerations:

Common Pitfalls:


APTS-AL-012: Kill Switch and Pause Capability

Implementation: Provide immediate operator-initiated pause and terminate commands with single-click activation. Pause must stop all active technique execution and pending actions. Terminate must cleanly exit and preserve logs.

Key Considerations:

Common Pitfalls:


APTS-AL-013: Complete Attack Chain Execution Within Boundaries

Implementation: At Tier 2, permit full attack chains (recon → exploitation → post-exploitation) to execute autonomously within pre-approved boundaries. Boundaries must include scope, techniques, asset categories, and impact thresholds.

Key Considerations:

Common Pitfalls:


APTS-AL-014: Boundary Definition and Enforcement Framework

Implementation: Establish a formal boundary definition model covering scope (targets, networks, IP ranges), techniques (tool list, not-permitted techniques), asset categories (databases, credentials, PII), and impact thresholds. Enforce via code, not policy alone.

Key Considerations:

Common Pitfalls:


APTS-AL-015: Pre-Approved Action Categories and Decision Trees

Implementation: Define pre-approved categories of actions (for example, "SQL injection on web apps in scope", "Brute force on open SSH ports in DMZ") with clear decision criteria. Tool autonomy is restricted to these categories; other actions require human re-approval.

Key Considerations:

Common Pitfalls:


APTS-AL-016: Continuous Boundary Monitoring and Breach Detection

Implementation: Implement real-time monitoring of all technique execution against defined boundaries. If any action breaches scope, technique list, or impact threshold, immediately pause execution and alert operator. Do not proceed without explicit re-approval.

Key Considerations:

Common Pitfalls:


APTS-AL-017: Multi-Target Assessment Management

Implementation: At Tier 2, extend orchestration to manage assessment across multiple targets simultaneously. Maintain a target queue, prioritize based on criteria (criticality, dependency), and dispatch techniques to targets while respecting per-target scope and phase.

Key Considerations:

Common Pitfalls:


APTS-AL-018: Incident Response During Autonomous Testing

Implementation: When autonomous testing triggers a security incident (for example, intrusion detection alarm, endpoint alert), immediately pause all testing, capture logs, and alert operator. Require human decision before resuming: investigate incident, modify scope, or abort test.

Key Considerations:

Common Pitfalls:


APTS-AL-019: Multi-Target Campaign Management Without Intervention

Implementation: At Tier 3, permit full autonomous operation across multiple targets and campaigns. System manages queue, phases, escalations, and incident response without human intervention between cycle reviews. Operator reviews cycle results and authorizes next cycle.

Key Considerations:

Common Pitfalls:


APTS-AL-020: Dynamic Scope Adjustment and Target Discovery

Implementation: At Tier 3, permit dynamic inclusion of targets discovered within pre-approved parameters (for example, subnets, asset categories, vulnerability criteria). System auto-adds targets; operator reviews and approves via periodic cycle review.

Key Considerations:

Common Pitfalls:


APTS-AL-021: Adaptive Testing Strategy and Resource Reallocation

Implementation: At Tier 3, the system may autonomously adjust testing strategy and reallocate resources based on findings (for example, deeper investigation of critical vulnerabilities, deprioritization of patched systems). Decisions must be explainable and logged for review.

Key Considerations:

Common Pitfalls:


APTS-AL-022: Continuous Risk Assessment and Automated Escalation

Implementation: At Tier 3, the system continuously computes risk scores based on findings and proactively escalates high-risk items to operator (critical vulnerabilities, potential compliance violations). Escalation does not pause testing but alerts operator to high-priority findings.

Key Considerations:

Common Pitfalls:


APTS-AL-023: Complete Audit Trail and Forensic Reconstruction

Implementation: Maintain a complete, immutable audit trail that is forensically sufficient to reconstruct every action, decision, and outcome. Logs must support detailed investigation of incidents, operator decisions, and system behavior without re-running the test.

Key Considerations:

Common Pitfalls:


APTS-AL-024: Periodic Autonomous Review Cycles

Implementation: At Tier 3, despite autonomous operation, establish periodic review cycles (weekly, bi-weekly) where operator examines cycle results, validates scope adherence, reviews escalations, and explicitly authorizes the next cycle.

Key Considerations:

Common Pitfalls:


APTS-AL-025: Autonomy Level Authorization, Transition, and Reauthorization

Implementation: Establish formal governance for autonomy level assignment and progression. Each engagement must have documented authorization for the autonomy level being used, signed by appropriate stakeholders (customer, pentester lead, legal if required).

Key Considerations:

Common Pitfalls:


APTS-AL-026: Incident Investigation and Autonomy Level Adjustment

Implementation: When a security incident or escalation triggers during testing, conduct structured investigation. If incident results from system malfunction or boundary breach, downgrade autonomy level pending root-cause analysis and corrective action.

Key Considerations:

Common Pitfalls:


APTS-AL-027: Evasion and Stealth Mode Governance

Implementation: Evasion and stealth techniques (anti-forensics, evasion of monitoring, credential masking) must be explicitly authorized in writing and default to OFF. If used, document all stealth measures in the test report and disclose to customer.

Key Considerations:

Common Pitfalls:


APTS-AL-028: Containment Verification for L3 and L4 Autonomy

Implementation: Build a containment verification harness that lives outside the agent runtime and has independent credentials to the boundary components it tests. Keep the probe catalog in version control alongside the APTS-SC-019 sandbox policy and the APTS-SC-020 allowlist so that widening the boundary always comes with a new probe. Probes should be concrete and reproducible: attempt a write to a filesystem path outside the declared allow set, attempt a DNS lookup for a host outside the egress allowlist, attempt to invoke a tool with parameters outside the allowlist schema, attempt to submit a prompt-injected instruction that tries to drive the agent toward a denied tool. Run the harness on the schedule defined by the requirement, and run it automatically after any change to the sandbox policy, allowlist, or foundation model. Record every run to the audit store so that the reviewer can verify the schedule and inspect results independently of the agent.

Key Considerations:

Common Pitfalls:


Implementation Roadmap

Phase 1 (implement before any autonomous pentesting begins): APTS-AL-001 through APTS-AL-006 (L1 controls: single technique execution, human-directed selection, parameter configuration, no auto-chaining, audit trail, scope validation), APTS-AL-008 (real-time monitoring and approval gates), APTS-AL-011 (escalation triggers), APTS-AL-012 (kill switch and pause), APTS-AL-014 (boundary enforcement framework), APTS-AL-016 (continuous boundary monitoring).

Start with APTS-AL-001 through APTS-AL-004 (L1 constraints) as the foundation. These ensure the tool cannot operate beyond human direction. Add APTS-AL-012 (kill switch) and APTS-AL-014 (boundary enforcement) as safety controls, then APTS-AL-005, APTS-AL-006, APTS-AL-008 for audit and monitoring.

Phase 2 (implement within first 3 engagements): APTS-AL-007 (multi-step chaining within phase), APTS-AL-009 (tool-proposed actions, SHOULD), APTS-AL-010 (phase transition audit), APTS-AL-013 (full attack chain within boundaries), APTS-AL-015 (pre-approved action categories), APTS-AL-017 (multi-target management), APTS-AL-018 (incident response during testing), APTS-AL-025 (autonomy level authorization and transition), APTS-AL-026 (incident-triggered level adjustment), APTS-AL-027 (evasion/stealth governance).

Implement APTS-AL-025 (authorization framework) first. It governs how the platform moves between autonomy levels. Then add APTS-AL-007 and APTS-AL-013 (chaining controls) for L2/L3 operations.

Phase 3 (implement based on operational maturity): APTS-AL-019 (autonomous multi-target campaigns), APTS-AL-020 (dynamic scope adjustment), APTS-AL-021 (adaptive strategy, SHOULD), APTS-AL-022 (continuous risk assessment, SHOULD), APTS-AL-023 (complete forensic audit trail), APTS-AL-024 (periodic autonomous review, SHOULD), APTS-AL-028 (containment verification for L3 and L4 autonomy).

Phase 3 requirements apply to platforms targeting L3 Semi-Autonomous or L4 Autonomous operation. Implement APTS-AL-028 alongside APTS-SC-019 and APTS-SC-020: those three together give you a declared boundary, a declared action space, and a periodic independent check that both still hold.