Safety Controls: Implementation Guide

Practical guidance for implementing APTS Safety Controls requirements. Each section provides a brief implementation approach, key considerations, and common pitfalls.

Note: This guide is informative, not normative. Recommended defaults and example values are suggested starting points; the Safety Controls README contains the authoritative requirements. Where this guide and the README differ, the README governs.


APTS-SC-001: Impact Classification and CIA Scoring

Implementation: Define 4-5 impact tiers (Critical/High/Medium/Low/Info) for all pentesting actions. Classify each technique before execution based on potential system disruption, data exposure, or service degradation.

Key Considerations:

Common Pitfalls:


APTS-SC-002: Industry-Specific Impact Considerations

Implementation: Overlay regulatory requirements (HIPAA, PCI-DSS, GDPR, and other applicable frameworks) on impact classifications. Adjust tiers where compliance violations carry heightened penalties. Document regulatory mapping.

Key Considerations:

Common Pitfalls:


APTS-SC-003: Real-World Impact Classification Examples

Implementation: Document at least 10 detailed scenarios with actual impact outcomes (for example, "SQL injection on payment DB caused 15-minute query slowdown; CIA 2/9/7"). Include timestamp, technique, and measured impact.

Key Considerations:

Common Pitfalls:


APTS-SC-004: Rate Limiting, Bandwidth, and Payload Constraints

Implementation: Implement per-host connection caps and request-per-second limits based on impact tier. Enforce at the network layer (firewall rules) or application layer (API rate limiting). Recommended starting defaults: 50 concurrent connections and 100 req/s for Low-impact actions, scaling down to 5 connections and 10 req/s for Critical-impact actions. Enforce maximum payload sizes per impact tier: Informational/Low 100KB, Medium 10MB, High 100MB, Critical 500MB, with an engagement-wide bandwidth cap of 5 Gbps. Reject any action exceeding its payload limit before transmission. Implement cascading rate limits at subnet, datacenter, and engagement-wide levels so that aggregate traffic approaching subnet limits proactively reduces per-host rates. Adjust all values based on target capacity and document chosen values with rationale.

Key Considerations:

Common Pitfalls:


APTS-SC-005: Cascading Failure Prevention in Interconnected Systems

Implementation: Map interdependencies between systems (shared databases, API gateways, load balancers). Pre-identify failure cascades. Implement kill switches at dependency nodes (for example, stop all DB queries if query queue depth > threshold).

Key Considerations:

Common Pitfalls:


APTS-SC-006: Threshold Escalation Workflow (Automated → Approval → Prohibited)

Implementation: Implement graduated escalation by impact level: Informational/Low actions execute automatically, Medium actions require standard approval within a defined window, High actions require elevated approval, and Critical actions require senior approval plus live operator confirmation. Recommended approval SLA defaults: Medium 15 minutes, High 30 minutes, Critical requires synchronous confirmation. Document the decision tree for each level.

Key Considerations:

Common Pitfalls:


APTS-SC-007: Cumulative Risk Scoring with Time-Based Decay

Implementation: Track cumulative impact over rolling windows (for example, 5-minute buckets). Apply decay function so old impact counts less (for example, 50% weight for 5+ min old). Reset counters at end of each engagement phase.

Key Considerations:

Common Pitfalls:

Risk Decay Formula:

A commonly used decay model is exponential time-based decay:

current_risk = base_risk * e^(-lambda * t)

Where:

Reset windows: after a configurable period of zero new risk accumulation (recommended default: 4 hours), the cumulative score resets to zero. Document the chosen decay constant and reset window in the engagement configuration.


APTS-SC-008: Threshold Configuration with Schema Validation

Implementation: Store all thresholds (rate limits, impact tiers, CIA scores, risk scores) in a structured config file (YAML/JSON) with JSON Schema validation. Enable hot-reload without restart.

Key Considerations:

Common Pitfalls:


APTS-SC-009: Kill Switch

Implementation: Implement two-phase kill switch: Phase 1 (5s) ceases all new actions and allows in-flight requests to complete; Phase 2 (60s) force-terminates all connections and cleans up resources. Operator can trigger manually or system triggers automatically.

Key Considerations:

Common Pitfalls:

Cross-Domain Integration:

When the kill switch activates, multiple domains execute in parallel:

  1. SC domain: SC-009 halts new actions (Phase 1), SC-016 preserves evidence, SC-014 initiates rollback
  2. AR domain: AR-001 logs the kill switch event with timestamp, trigger source, and operator ID
  3. HO domain: HO-008 captures state dump, HO-015 sends notifications to all configured channels
  4. SE domain: SE-006 ceases all scope validation (no new actions to validate)

Implementation MUST ensure these cross-domain actions execute independently. A failure in AR logging MUST NOT delay SC halt. A failure in HO notification MUST NOT prevent evidence preservation.


APTS-SC-010: Health Check Monitoring, Threshold Adjustment, and Automatic Halt

Implementation: Execute health checks (HTTP GET, ICMP ping, database query) at regular intervals on all monitored targets. Recommended starting default: every 30 seconds for Critical assets, every 60 seconds for Production, every 5 minutes for Non-Production. If check fails or response time exceeds 200% of baseline, trigger escalation workflow (see APTS-SC-006). Document chosen intervals with rationale.

Key Considerations:

Common Pitfalls:

Cloud-Native Health Monitoring:

For platforms deployed on Kubernetes or container orchestration:


APTS-SC-011: Condition-Based Automated Termination

Implementation: Define explicit termination conditions and encode them in configuration. Recommended starting conditions: terminate if target service is unavailable for 5 consecutive health checks; error rate exceeds 10% sustained for 60 seconds; memory exhaustion detected on target; privilege escalation occurs outside scope; or cumulative risk score exceeds halt threshold (see APTS-SC-007). Log every trigger with context. Document chosen conditions with rationale.

Key Considerations:

Common Pitfalls:


APTS-SC-012: Network-Level Circuit Breaker

Implementation: At network layer, suspend traffic to degraded targets and enforce configurable cooldown period. Before resuming, probe with low-intensity health check. Only resume if probe succeeds.

Key Considerations:

Common Pitfalls:


APTS-SC-013: Time-Based Automatic Termination with Operator Override

Implementation: Enforce operator-specified maximum engagement duration with configurable time limits. Provide advance warning before time-based termination at a configured threshold. Extensions should require manager approval and be logged with justification for audit.

Key Considerations:

Common Pitfalls:


APTS-SC-014: Reversible Action Tracking and Rollback

Implementation: Log all reversible actions (firewall rule adds, account creations, privilege escalations) with before/after state. Maintain rollback procedures (scripts or manual steps) for each action. Test rollback paths in staging.

Key Considerations:

Common Pitfalls:


APTS-SC-015: Post-Test System Integrity Validation

Implementation: After engagement, run automated integrity checks: file hashing, configuration comparison against baseline, user account audit, security group audit. Generate report with discrepancies. Require sign-off before concluding engagement.

Key Considerations:

Common Pitfalls:


APTS-SC-016: Evidence Preservation and Automated Cleanup

Implementation: Write penetration test evidence (logs, screenshots, captured traffic) to write-once storage (immutable S3 bucket, WORM drive). Schedule automated cleanup after retention period (for example, 90 days). Log all cleanup actions.

Key Considerations:

Common Pitfalls:


APTS-SC-017: External Watchdog and Operator Notification

Implementation: Deploy independent external watchdog (separate infrastructure) that periodically contacts the penetration test platform and receives health status. If no contact for 5 min, send out-of-band alert (SMS, PagerDuty) to incident commander and customer.

Key Considerations:

Common Pitfalls:


APTS-SC-018: Incident Containment and Recovery

Implementation: On unintended impact detection: auto-isolate affected system (network segmentation), halt all tests, initiate recovery (rollback + validation). Require re-validation of platform health before resuming tests on any system.

Key Considerations:

Common Pitfalls:


APTS-SC-019: Execution Sandbox and Containment Boundary Integrity

Implementation: Run the agent runtime inside a kernel-enforced sandbox (Linux namespaces plus seccomp profile, or a hardened container runtime such as gVisor or Kata Containers), or inside a lightweight VM with a minimal hypervisor attack surface. Declare the allowed filesystem paths, outbound network destinations, and process capabilities in a policy file committed alongside the agent's other configuration. Enforce network egress through an external egress proxy or L4 network policy that the agent runtime has no credentials to bypass. Log every blocked action as a containment event and route it to the platform's incident queue.

Key Considerations:

Common Pitfalls:


APTS-SC-020: Action Allowlist Enforcement External to the Model

Implementation: Maintain the agent's tool and action allowlist as a version-controlled artifact (for example, a YAML or JSON file in the platform repository). At runtime, the orchestration layer consults the allowlist before dispatching any tool call produced by the model. Allowlist entries specify the tool identifier, parameter schemas or bounds, and the risk classification used by the threshold escalation workflow in SC-006. The model's system prompt references the allowlist for clarity but does not define it. Tool additions, removals, and parameter widening go through the same code review and approval process as any other platform change.

Key Considerations:

Common Pitfalls:


Implementation Roadmap

Phase 1 (implement before any autonomous pentesting begins): SC-001 (impact classification and CIA scoring), SC-003 (real-world classification examples), SC-004 (rate limiting, bandwidth, and payload constraints), SC-009 (kill switch), SC-010 (health check monitoring), SC-015 (post-test integrity validation), SC-020 (action allowlist enforcement external to the model).

Start with the kill switch (SC-009) and the external action allowlist (SC-020). These are the two architectural properties every responsible autonomous pentest platform must have before first engagement. Then implement impact classification (SC-001, SC-003) and rate limiting (SC-004). Layer health monitoring (SC-010) and post-test validation (SC-015) before first engagement.

Phase 2 (implement within first 3 engagements): SC-002 (industry-specific impacts), SC-005 (cascading failure prevention), SC-006 (escalation workflow), SC-007 (cumulative risk scoring with time-based decay), SC-011 and SC-012 (automated termination, circuit breaker), SC-014 (rollback tracking), SC-016 (evidence preservation/cleanup), SC-017 and SC-018 (watchdog, incident containment), SC-019 (execution sandbox and containment boundary integrity).

Prioritize SC-011 and SC-012 (automated termination, circuit breaker) first. They prevent runaway testing. Then add SC-006 (escalation workflow) and SC-014 (rollback) to handle controlled degradation, and SC-019 (execution sandbox) to contain the agent runtime within a boundary the model itself cannot reach.

Phase 3 (implement for maximum assurance): SC-008 (threshold schema validation, SHOULD), SC-013 (time-based termination, SHOULD).