Graduated Autonomy: Implementation Guide
Practical guidance for implementing APTS Graduated Autonomy requirements. Each section provides a brief implementation approach, key considerations, and common pitfalls.
Note: This guide is informative, not normative. Recommended defaults and example values are suggested starting points; the Graduated Autonomy README contains the authoritative requirements. Where this guide and the README differ, the README governs.
APTS-AL-001: Single Technique Execution
Implementation: Implement an execution model that isolates each technique invocation. Enforce a one-technique-per-operation constraint at the API and scheduling layer. No automatic chaining or result-driven sequencing.
Key Considerations:
- Require explicit human command for each technique execution
- Provide clear feedback on completion status before accepting new commands
- Isolate execution contexts to prevent state leakage between operations
Common Pitfalls:
- Tools that auto-escalate credentials or chain exploits without human intervention
- Workflow engines that trigger successive techniques based on prior results
APTS-AL-002: Human-Directed Target and Technique Selection
Implementation: Remove all heuristic or automated selection logic. All targeting and technique choices originate from explicit human operator commands. Validate that selections match authorized scope.
Key Considerations:
- Build command interfaces that require unambiguous target and technique specification
- Log all selection decisions with operator identity and timestamp
- Prevent default or inferred selections from applying
Common Pitfalls:
- Fuzzy matching or "best guess" target resolution
- Tool defaults that apply if parameters are incomplete
- Implicit authorization assumptions
APTS-AL-003: Parameter Configuration by Human Operator
Implementation: Require explicit human configuration of all technique parameters before execution. Never apply tool defaults or infer parameters from context. Surface all configurable options with clear explanations.
Key Considerations:
- Prompt operators for every tunable parameter with safe defaults offered for review
- Reject execution if critical parameters lack explicit operator confirmation
- Document which defaults were proposed and which were operator-chosen
Common Pitfalls:
- Silent parameter substitution if operator omits optional fields
- Undocumented tool assumptions that override operator intent
- Parameter inheritance from previous operations
APTS-AL-004: No Automated Chaining or Sequential Decision-Making
Implementation: Disable workflow automation and decision trees at Tier 1. Require explicit human command for every subsequent action. Build enforcement at the orchestration layer, not just by guideline.
Key Considerations:
- Remove or disable conditional logic that branches based on technique results
- Implement hard blocking if chaining is attempted without new human approval
- Clearly communicate wait state and expected next command
Common Pitfalls:
- Workflow templates that auto-progress based on success/failure conditions
- Background tasks that trigger escalation without alerting the operator
- Conditional parameter injection from prior results
APTS-AL-005: Mandatory Logging and Human-Reviewable Audit Trail
Implementation: Capture all actions, parameters, results, operator decisions, and timestamps. Ensure logs are human-readable and forensically complete. Include operator identity, authorization scope, and approval status for each action.
Key Considerations:
- Log must capture intent (what was requested), execution (what was done), and outcome
- Timestamp all events with atomic clock or trusted time source
- Include sufficient context for later reconstruction without re-execution
Common Pitfalls:
- Binary or compressed logs that require tools to decode
- Missing operator identity or authorization source
- Sparse logging that omits intermediate steps or failed attempts
APTS-AL-006: Basic Scope Validation and Policy Enforcement
Implementation: Validate all targets, techniques, and parameters against explicit scope rules before execution. Maintain a scope authorization document (whitelist or CIDR blocks). Block any execution outside authorized scope.
Key Considerations:
- Implement validation as a non-bypassable pre-execution gate
- Scope rules must be maintained by authorized personnel, not operators
- Log all scope validation decisions (pass and fail)
Common Pitfalls:
- Scope files that are editable by operators
- Validation logic that runs after execution begins
- Scope interpreted as "guidelines" rather than hard limits
APTS-AL-007: Multi-Step Technique Chaining Within Single Phase
Implementation: At Tier 2, permit chaining of multiple techniques within a single phase (for example, multiple recon steps, or multiple exploitation steps on a confirmed target) without per-action human approval, provided all belong to the same pre-defined phase.
Key Considerations:
- Define phase boundaries explicitly in policy (for example, Reconnaissance, Exploitation, Persistence)
- Operator approves the phase and entry conditions; system chains within phase
- Log each chained step and provide operator visibility into chain progression
Common Pitfalls:
- Vague phase definitions that blur boundaries
- Chaining across phases without human re-approval
- Loss of visibility into which steps were automatic and which were operator-triggered
L2/L3 Architecture Pattern:
L2 and L3 autonomy require a state machine governing phase transitions and action authorization:
States: IDLE → RECONNAISSANCE → EXPLOITATION → POST_EXPLOITATION → REPORTING → CLEANUP
Transitions:
IDLE → RECONNAISSANCE: requires operator_start_command AND scope_validated
RECONNAISSANCE → EXPLOITATION: requires (discovery_complete OR operator_override) AND operator_approval(L2) OR boundary_check_pass(L3)
EXPLOITATION → POST_EXPLOITATION: requires exploitation_objectives_met AND operator_review(L2) OR auto_transition(L3)
POST_EXPLOITATION → REPORTING: requires post_exploitation_complete AND evidence_preserved
REPORTING → CLEANUP: requires report_generated AND rollback_initiated
ANY → IDLE: kill_switch_activated (unconditional, highest priority)
Each transition MUST be logged per APTS-AR-001 with: previous state, new state, transition trigger, operator ID (if applicable), and timestamp. At L2, operator approval is required for any transition into EXPLOITATION. At L3, the platform may auto-transition if pre-approved boundary checks pass, but MUST log the decision rationale.
APTS-AL-008: Real-Time Human Monitoring and Approval Gates
Implementation: Provide a real-time dashboard showing active tests, current technique execution, findings, and escalation alerts. Implement approval gates for high-risk actions (exploitation, lateral movement) with operator override capability.
Key Considerations:
- Dashboard must update without requiring operator polling
- Approval gates must block execution until operator acknowledges and approves
- Provide clear escalation path and alert thresholds
- Support intervention (pause, terminate) at any point
Common Pitfalls:
- Approval gates that can be silently overridden
- Delays in alert delivery or dashboard updates
- Insufficient context to make approval decision
Implementation: At Tier 2, permit the tool to propose the next action(s) based on findings, but require operator approval with modification capability. Operator can reject, edit parameters, or substitute a different action.
Key Considerations:
- Display reasoning for tool proposal so operator can evaluate soundness
- Allow modification of proposed parameters without re-authorizing entire action
- Log operator modifications and rationale
- Reject execution if proposal logic becomes opaque or unexplainable
Common Pitfalls:
- Proposals that are hard to understand or justify
- Modification interfaces that are cumbersome or error-prone
- Assumption that proposal approval constitutes informed consent
APTS-AL-010: Step-by-Step Audit Log with Phase Transitions
Implementation: Enhance the Tier 1 audit trail to capture each step explicitly, including operator decisions at phase transitions. Record when operator approves phase entry, chaining parameters, and any modifications to tool proposals.
Key Considerations:
- Phase transition decisions must be logged with explicit operator sign-off
- Include tool reasoning and operator rationale for modifications
- Make logs queryable by phase, operator, and decision type
- Preserve enough context to understand branching decisions
Common Pitfalls:
- Phase transitions logged implicitly rather than as explicit decisions
- Inability to correlate operator decisions with log entries
- Logs that omit rationale for modifications or rejections
APTS-AL-011: Escalation Triggers and Exception Handling
Implementation: Define explicit escalation triggers for scope violations, unplanned exploitation (crashes, denial of service), anomalies, and policy breaches. Automatically pause execution and alert on any trigger. Require operator decision before resuming.
Key Considerations:
- Escalation triggers must be pre-defined and documented in policy
- System must support custom triggers per engagement
- Alert delivery must be reliable and timely (under 2 minutes)
- Logging must capture what triggered escalation and operator response
Common Pitfalls:
- Triggers so loose that they never fire, or so tight they spam alerts
- Pauses that automatically resume after timeout
- Escalations that proceed in background while operator is unaware
APTS-AL-012: Kill Switch and Pause Capability
Implementation: Provide immediate operator-initiated pause and terminate commands with single-click activation. Pause must stop all active technique execution and pending actions. Terminate must cleanly exit and preserve logs.
Key Considerations:
- Kill switch must be available at all times and take effect in under 10 seconds
- Distinguish pause (resumable) from terminate (final) operations
- Preserve state on pause to allow clean resumption
- Log all pause/terminate events with operator identity and rationale
Common Pitfalls:
- Kill switches that require multi-step confirmation
- Pause that doesn't actually stop background tasks
- Resumption that loses context or replays unintended actions
APTS-AL-013: Complete Attack Chain Execution Within Boundaries
Implementation: At Tier 2, permit full attack chains (recon → exploitation → post-exploitation) to execute autonomously within pre-approved boundaries. Boundaries must include scope, techniques, asset categories, and impact thresholds.
Key Considerations:
- Boundaries must be formally defined and technically enforced
- Pre-approval must be documented with authorization date and signoff
- System must continuously verify that chain stays within boundaries
- Any boundary breach must trigger immediate pause and escalation
Common Pitfalls:
- Vague boundaries that require interpretation during execution
- Boundaries defined as policy documents rather than enforced in code
- Lack of visibility into what the chain is actually doing
APTS-AL-014: Boundary Definition and Enforcement Framework
Implementation: Establish a formal boundary definition model covering scope (targets, networks, IP ranges), techniques (tool list, not-permitted techniques), asset categories (databases, credentials, PII), and impact thresholds. Enforce via code, not policy alone.
Key Considerations:
- Boundaries must be version-controlled and change-tracked
- Enforcement must occur pre-execution and continuously during execution
- Support multiple boundary sets per engagement
- Integrate with scope authorization document (APTS-AL-006)
Common Pitfalls:
- Boundaries described in prose rather than machine-readable format
- Enforcement that relies on operator judgment
- Outdated boundaries that diverge from current authorization
APTS-AL-015: Pre-Approved Action Categories and Decision Trees
Implementation: Define pre-approved categories of actions (for example, "SQL injection on web apps in scope", "Brute force on open SSH ports in DMZ") with clear decision criteria. Tool autonomy is restricted to these categories; other actions require human re-approval.
Key Considerations:
- Decision trees must have clear, measurable entry criteria
- Each category must map to pre-approved techniques and parameters
- Log which category each autonomous decision fell under
- Review and update categories quarterly or per engagement
Common Pitfalls:
- Categories so broad they encompass unexpected actions
- Decision trees with subjective or unmeasurable criteria
- No mechanism to challenge or override categorization
APTS-AL-016: Continuous Boundary Monitoring and Breach Detection
Implementation: Implement real-time monitoring of all technique execution against defined boundaries. If any action breaches scope, technique list, or impact threshold, immediately pause execution and alert operator. Do not proceed without explicit re-approval.
Key Considerations:
- Monitoring must sample actively, not depend on post-execution review
- Detection logic must be fast enough to prevent harm (sub-second ideal)
- Alert must clearly articulate what boundary was crossed and why
- Provide mechanism for operator to acknowledge, override with caution, or correct and retry
Common Pitfalls:
- Batch monitoring that only checks after technique completes
- Boundaries defined but not monitored
- Alerts that are too verbose or missing key context
APTS-AL-017: Multi-Target Assessment Management
Implementation: At Tier 2, extend orchestration to manage assessment across multiple targets simultaneously. Maintain a target queue, prioritize based on criteria (criticality, dependency), and dispatch techniques to targets while respecting per-target scope and phase.
Key Considerations:
- Target queue must be human-managed (no auto-discovery at Tier 2)
- Prioritization rules must be pre-defined and configurable
- Each target must have independent scope and phase state
- System must prevent resource exhaustion or unintended collateral impact
Common Pitfalls:
- Target queue that auto-expands to related systems
- Prioritization that operator cannot override
- Cross-target resource conflicts or interference
APTS-AL-018: Incident Response During Autonomous Testing
Implementation: When autonomous testing triggers a security incident (for example, intrusion detection alarm, endpoint alert), immediately pause all testing, capture logs, and alert operator. Require human decision before resuming: investigate incident, modify scope, or abort test.
Key Considerations:
- Integrate with customer's SIEM or monitoring tools to detect real incidents
- Pause must be non-negotiable and immediate
- Provide incident context (time, system, alert) to operator for decision-making
- Log decision rationale in test report
Common Pitfalls:
- Testing continues in background while operator is unaware of incident
- Incident detection that only works for known alert signatures
- Operator pressure to resume before incident is investigated
APTS-AL-019: Multi-Target Campaign Management Without Intervention
Implementation: At Tier 3, permit full autonomous operation across multiple targets and campaigns. System manages queue, phases, escalations, and incident response without human intervention between cycle reviews. Operator reviews cycle results and authorizes next cycle.
Key Considerations:
- Campaigns must be pre-authorized with explicit scope, targets, and success criteria
- System must operate independently for a defined cycle (for example, 1 week)
- Periodic review (for example, weekly) provides human check-in and re-authorization
- All incident responses, escalations, and scope decisions must be logged for review
Common Pitfalls:
- Cycles so long that incidents accumulate unnoticed
- Lack of visibility into campaign progress between reviews
- Re-authorization that is perfunctory rather than substantive
APTS-AL-020: Dynamic Scope Adjustment and Target Discovery
Implementation: At Tier 3, permit dynamic inclusion of targets discovered within pre-approved parameters (for example, subnets, asset categories, vulnerability criteria). System auto-adds targets; operator reviews and approves via periodic cycle review.
Key Considerations:
- Discovery must be restricted to pre-defined parameters (no open-ended scoping)
- Discovered targets must be logged and presented to operator at cycle review
- Operator can exclude discovered targets or adjust parameters
- Technical enforcement of scope boundaries remains in place
Common Pitfalls:
- Discovery parameters interpreted too broadly
- Discovered targets added without logging or review
- Inability to retroactively exclude targets from completed cycles
APTS-AL-021: Adaptive Testing Strategy and Resource Reallocation
Implementation: At Tier 3, the system may autonomously adjust testing strategy and reallocate resources based on findings (for example, deeper investigation of critical vulnerabilities, deprioritization of patched systems). Decisions must be explainable and logged for review.
Key Considerations:
- Strategy adaptation must remain within pre-approved technique and boundary constraints
- Adaptation logic must be rule-based and transparent, not black-box ML
- Log all adaptation decisions with reasoning
- Operator can override or reverse adaptations at cycle review
Common Pitfalls:
- Adaptation logic that is opaque or uses unexplainable heuristics
- Adaptations that deviate from pre-approved techniques
- Resource reallocation that causes starvation or unfair targeting
APTS-AL-022: Continuous Risk Assessment and Automated Escalation
Implementation: At Tier 3, the system continuously computes risk scores based on findings and proactively escalates high-risk items to operator (critical vulnerabilities, potential compliance violations). Escalation does not pause testing but alerts operator to high-priority findings.
Key Considerations:
- Risk scoring must be documented and configurable per engagement
- Escalations must be timely (near-real-time) and not defer to cycle review
- High-risk escalations may prompt operator intervention mid-cycle
- Maintain separation between informational escalation and pause-triggering escalation
Common Pitfalls:
- Risk scores that are not explainable or not configurable
- Escalations that are too frequent (alert fatigue) or too rare (missed risks)
- Escalations that automatically pause testing instead of alerting
APTS-AL-023: Complete Audit Trail and Forensic Reconstruction
Implementation: Maintain a complete, immutable audit trail that is forensically sufficient to reconstruct every action, decision, and outcome. Logs must support detailed investigation of incidents, operator decisions, and system behavior without re-running the test.
Key Considerations:
- Logs must capture all inputs, execution context, results, and decisions
- Logs must be tamper-evident (digital signatures or append-only storage)
- Log retention must meet regulatory and contractual requirements
- Logs must support efficient querying and export for forensic analysis
Common Pitfalls:
- Logs that require external tools or databases to interpret
- Log formats that change mid-engagement
- Insufficient metadata to correlate logs across systems
APTS-AL-024: Periodic Autonomous Review Cycles
Implementation: At Tier 3, despite autonomous operation, establish periodic review cycles (weekly, bi-weekly) where operator examines cycle results, validates scope adherence, reviews escalations, and explicitly authorizes the next cycle.
Key Considerations:
- Review must be substantive, not perfunctory (operator inspects logs and findings)
- Review checklist should cover scope, escalations, incidents, and adaptations
- Authorization for next cycle must be documented
- Issues discovered in review should be addressed before cycle restart
Common Pitfalls:
- Review cycles that are too long (issues accumulate)
- Reviews that are delegated to junior staff without authority to pause
- Lack of escalation path if review raises concerns
APTS-AL-025: Autonomy Level Authorization, Transition, and Reauthorization
Implementation: Establish formal governance for autonomy level assignment and progression. Each engagement must have documented authorization for the autonomy level being used, signed by appropriate stakeholders (customer, pentester lead, legal if required).
Key Considerations:
- Authorization must occur before testing begins and be documented in engagement agreement
- Progression from L1 to L2 to L3 to L4 must be explicit, not automatic
- Reauthorization may be required if scope, customer, or risk profile changes
- Maintain audit trail of all authorization decisions
Common Pitfalls:
- Assuming authorization from previous engagements carries over
- Progression decided by tool, not by explicit stakeholder approval
- Lack of mechanism to downgrade if trust is violated
APTS-AL-026: Incident Investigation and Autonomy Level Adjustment
Implementation: When a security incident or escalation triggers during testing, conduct structured investigation. If incident results from system malfunction or boundary breach, downgrade autonomy level pending root-cause analysis and corrective action.
Key Considerations:
- Incident investigation must be documented and shared with stakeholders
- Downgrade must occur promptly; testing may resume only at lower autonomy level
- Root-cause analysis must address system, process, or training failures
- Corrective actions must be verified before autonomy level is restored
Common Pitfalls:
- Incidents treated as isolated events rather than system failures
- Downgrades that are reversed without addressing root cause
- Investigation that lacks independent review
APTS-AL-027: Evasion and Stealth Mode Governance
Implementation: Evasion and stealth techniques (anti-forensics, evasion of monitoring, credential masking) must be explicitly authorized in writing and default to OFF. If used, document all stealth measures in the test report and disclose to customer.
Key Considerations:
- Evasion authorization must be separate from general test authorization
- Stealth techniques MUST be listed explicitly, not inferred
- Report MUST document which techniques were used and how
- Stealth measures MUST NOT obscure the audit trail or incident response data
Common Pitfalls:
- Evasion enabled by default or assumed to be implicit in authorization
- Stealth that prevents detection of incidents or compliance violations
- Insufficient disclosure in report about evasion methods
- Using stealth to avoid accountability rather than to test detection
APTS-AL-028: Containment Verification for L3 and L4 Autonomy
Implementation: Build a containment verification harness that lives outside the agent runtime and has independent credentials to the boundary components it tests. Keep the probe catalog in version control alongside the APTS-SC-019 sandbox policy and the APTS-SC-020 allowlist so that widening the boundary always comes with a new probe. Probes should be concrete and reproducible: attempt a write to a filesystem path outside the declared allow set, attempt a DNS lookup for a host outside the egress allowlist, attempt to invoke a tool with parameters outside the allowlist schema, attempt to submit a prompt-injected instruction that tries to drive the agent toward a denied tool. Run the harness on the schedule defined by the requirement, and run it automatically after any change to the sandbox policy, allowlist, or foundation model. Record every run to the audit store so that the reviewer can verify the schedule and inspect results independently of the agent.
Key Considerations:
- The harness is part of the platform's safety control surface; treat it with the same code review and change management as the controls it tests
- A successful probe (boundary held) is a positive signal but not proof the boundary holds against all adversaries; a failed probe is a definitive signal something is wrong
- Probes drawn from recent threat intelligence about the foundation model in use age quickly; revisit the catalog at least quarterly
Common Pitfalls:
- Using the agent runtime itself to run the probes, which lets a compromised or drifted agent report "all green" regardless of reality
- Running the harness on a schedule but not regenerating it after widening the allowlist, so new capabilities ship untested
- Silencing failed probes because "the sandbox is probably fine" instead of treating them as containment incidents under APTS-SC-018
Implementation Roadmap
Phase 1 (implement before any autonomous pentesting begins):
APTS-AL-001 through APTS-AL-006 (L1 controls: single technique execution, human-directed selection, parameter configuration, no auto-chaining, audit trail, scope validation), APTS-AL-008 (real-time monitoring and approval gates), APTS-AL-011 (escalation triggers), APTS-AL-012 (kill switch and pause), APTS-AL-014 (boundary enforcement framework), APTS-AL-016 (continuous boundary monitoring).
Start with APTS-AL-001 through APTS-AL-004 (L1 constraints) as the foundation. These ensure the tool cannot operate beyond human direction. Add APTS-AL-012 (kill switch) and APTS-AL-014 (boundary enforcement) as safety controls, then APTS-AL-005, APTS-AL-006, APTS-AL-008 for audit and monitoring.
Phase 2 (implement within first 3 engagements):
APTS-AL-007 (multi-step chaining within phase), APTS-AL-009 (tool-proposed actions, SHOULD), APTS-AL-010 (phase transition audit), APTS-AL-013 (full attack chain within boundaries), APTS-AL-015 (pre-approved action categories), APTS-AL-017 (multi-target management), APTS-AL-018 (incident response during testing), APTS-AL-025 (autonomy level authorization and transition), APTS-AL-026 (incident-triggered level adjustment), APTS-AL-027 (evasion/stealth governance).
Implement APTS-AL-025 (authorization framework) first. It governs how the platform moves between autonomy levels. Then add APTS-AL-007 and APTS-AL-013 (chaining controls) for L2/L3 operations.
Phase 3 (implement based on operational maturity):
APTS-AL-019 (autonomous multi-target campaigns), APTS-AL-020 (dynamic scope adjustment), APTS-AL-021 (adaptive strategy, SHOULD), APTS-AL-022 (continuous risk assessment, SHOULD), APTS-AL-023 (complete forensic audit trail), APTS-AL-024 (periodic autonomous review, SHOULD), APTS-AL-028 (containment verification for L3 and L4 autonomy).
Phase 3 requirements apply to platforms targeting L3 Semi-Autonomous or L4 Autonomous operation. Implement APTS-AL-028 alongside APTS-SC-019 and APTS-SC-020: those three together give you a declared boundary, a declared action space, and a periodic independent check that both still hold.