Informative Appendix (non-normative)
This checklist gives customers, CISOs, procurement teams, and security reviewers a short path for screening autonomous pentesting platform operators before a deeper APTS review. It complements the Vendor Evaluation Guide, Evidence Request Checklist, Checklists, and Customer Acceptance Testing. It does not replace the full APTS requirements or create a certification process.
Use it when you need to decide how much additional review is warranted:
Collect these basics from the operator:
| Item | What to Ask For | Why It Matters |
|---|---|---|
| Claimed APTS tier | Tier 1, Tier 2, Tier 3, or no claim | Sets the review depth and expected evidence |
| APTS version and claim date | Version of APTS used and date the claim was last reviewed | Helps identify stale or generic claims |
| Assessment method | Self-assessment, independent internal review, or third-party assessment | Clarifies who reviewed the claim and what assurance it provides |
| Platform version | Product, service, or worker version reviewed | Prevents evidence for one version from being applied to another |
| Deployment model | SaaS, managed service, on-premises, hybrid, customer-hosted workers | Determines scope, tenant isolation, and customer responsibility boundaries |
| Supported autonomy levels | L1 Assisted through L4 Autonomous | Determines human oversight and safety expectations |
| Intended targets | Non-production, production, critical systems, APIs, cloud, client-side agents | Determines safety, scope, and evidence expectations |
| Evidence availability | Completed checklist, conformance claim, sample logs, demos, reports | Determines whether claims can be verified |
Use this screen to decide whether the operator is ready for deeper review.
| Question | Acceptable Signal | Red Flag |
|---|---|---|
| Which APTS tier do you claim, if any? | Clear tier statement or clear statement that no APTS conformance is claimed | Vague "APTS-aligned" statement with no tier, scope, or evidence |
| Can you provide a completed APTS checklist? | Completed Checklists for the claimed tier, or a mapped internal assessment for first-pass screening only | No per-requirement mapping |
| How do you ingest and enforce Rules of Engagement? | Machine-readable RoE, validation, pre-action checks, audit trail | Scope handled manually or only by operator policy |
| Can you demonstrate a kill switch? | Recorded or live demo showing stop behavior and audit record | No demo, no timing expectation, or unclear authority |
| How are findings validated before reporting? | Reproduction, confidence scoring, and human review for critical findings | Findings reported directly from model output without validation |
| What evidence is available for one sample finding? | Evidence package with hashes, provenance, redaction log, and report export link | Screenshots or summaries only, no raw artifacts or provenance |
| How are customer credentials and discovered secrets handled? | Lifecycle, rotation/revocation, retention, and redaction policy | Long-lived credentials, unclear ownership, or no disposal evidence |
| Which foundation models and providers are used? | Exact model identifiers, provider trust review, change tracking | "Latest model" with no versioning or change process |
| What happens if testing causes unintended impact? | Thresholds, escalation, rollback, incident response, customer notification | No impact thresholds or incident path |
| Are agents deployed in customer infrastructure? | Install/remove process, permissions, update path, and RoE coverage | Persistent agents without clear removal or boundary controls |
| Result | Suggested Next Step |
|---|---|
| Multiple red flags | Pause procurement or request remediation before deeper review |
| Some incomplete answers | Continue only with targeted evidence requests |
| Clear answers with evidence | Move to the 2-hour review or full review based on risk |
Use this review when the 30-minute screen passes and the engagement has moderate risk. Treat the result as triage or conditional procurement input unless none of the full review triggers below apply.
Ask for a small evidence pack before scheduling detailed demos. This pack is a prioritized subset of the broader Evidence Request Checklist:
| Evidence | Related APTS Areas | Review Focus |
|---|---|---|
| Completed checklist for claimed tier | All domains | Does every claimed requirement have status and evidence? |
| Conformance Claim Template or equivalent statement | Introduction, conformance model | Is claim scope, assessment method, APTS version, platform version, and claim date clear? |
| Sample Rules of Engagement record | Scope Enforcement | Is scope machine-readable and enforced before actions? |
| Kill switch test evidence | Safety Controls, Human Oversight, Auditability | Is stop behavior demonstrated and logged? |
| Sample audit log excerpt | Auditability | Can actions, decisions, actors, timestamps, and outcomes be traced? |
| Sample evidence package for one finding | Reporting, Auditability | Are raw artifacts, hashes, provenance, review, and redaction linked? |
| Human review record for a critical finding | Human Oversight, Reporting | Was the reviewer qualified and was the decision recorded? |
| Model/provider disclosure | Supply Chain Trust, Auditability | Are model identifiers, provider review, and change controls documented? |
| Data retention and deletion summary | Supply Chain Trust, Scope Enforcement | Are customer data and credentials retained and deleted according to policy? |
| Incident response and notification process | Safety Controls, Supply Chain Trust | Are customer notification triggers and timelines documented? |
For each artifact, verify that it is current, representative of the reviewed deployment mode and autonomy level, and cross-checkable against another artifact such as an audit log, checklist row, evidence manifest, or demonstration recording. Sanitized demos and marketing summaries are useful for orientation, but they are not substitutes for reviewable evidence when the deployment is high risk.
Confirm exclusions and shared-responsibility boundaries explicitly, especially for SaaS, on-premises, hybrid, and customer-hosted worker deployments. Customer responsibilities can materially affect the risk decision.
| Time | Activity |
|---|---|
| 0-15 minutes | Confirm claimed tier, deployment model, autonomy level, and excluded modes |
| 15-35 minutes | Review Rules of Engagement handling and pre-action scope enforcement |
| 35-55 minutes | Review kill switch, thresholds, rollback, and escalation evidence |
| 55-75 minutes | Trace one sample finding from discovery to evidence package to report export |
| 75-95 minutes | Review human approval, reviewer qualification, and critical-finding validation |
| 95-110 minutes | Review model/provider disclosure, data retention, and tenant isolation |
| 110-120 minutes | Record decision, open questions, and required follow-up evidence |
Move beyond the quick review when any of these conditions apply:
For full review, use the Vendor Evaluation Guide, per-tier Checklists, and optional Customer Acceptance Testing.
| Field | Notes |
|---|---|
| Operator reviewed | [Name] |
| Platform/version | [Name and version] |
| APTS version reviewed | [Version, for example v0.1.0] |
| Claim date | [YYYY-MM-DD] |
| Assessment method | [Self-assessment / independent internal review / third-party assessment] |
| Claimed APTS tier | [Tier or no claim] |
| Deployment model reviewed | [SaaS / managed service / on-premises / hybrid] |
| Autonomy levels reviewed | [L1-L4] |
| Evidence received | [List artifact IDs or links] |
| Red flags identified | [List or "None"] |
| Conditions or exceptions | [Required remediations or limitations] |
| Review decision | [Proceed / proceed with conditions / pause / reject] |
| Next review trigger | [Date, major platform change, incident, autonomy level change] |
A quick review typically produces one of four practical outcomes:
| Outcome | Meaning |
|---|---|
| Proceed | Evidence is sufficient for the current risk level |
| Proceed with conditions | The operator can proceed only after specific evidence or remediation is provided |
| Pause | Significant gaps require follow-up before procurement or deployment continues |
| Reject | The platform does not meet minimum safety, scope, or accountability expectations |
Document the decision, evidence reviewed, unresolved questions, and next review trigger. Revisit the decision when the operator changes deployment model, autonomy level, foundation model, safety controls, or incident response process.