Practical guidance for implementing APTS Reporting requirements. Each section provides a brief implementation approach, key considerations, and common pitfalls.
Note: This guide is informative, not normative. Recommended defaults and example values are suggested starting points; the Reporting README contains the authoritative requirements. Where this guide and the README differ, the README governs.
Implementation: Attach raw technical evidence (logs, screenshots, payloads, network traces) to every finding. Clearly demarcate any AI-generated summaries or analysis as distinct from raw evidence.
Key Considerations:
Common Pitfalls:
Implementation: Re-execute all Critical and High findings before report finalization using automated workflows. Tag each finding as "Confirmed" or "Unconfirmed" with reproduction timestamp and methodology.
Key Considerations:
Common Pitfalls:
Implementation: Assign 0-100% confidence scores based on measurable factors (evidence completeness, reproduction success, tool consistency, manual verification). Document the scoring model used in the assessment methodology.
Key Considerations:
Common Pitfalls:
Confidence Scoring Methodology:
A recommended confidence scoring approach uses weighted factors:
| Factor | Weight | Description |
|---|---|---|
| Evidence quality | 30% | Direct evidence (exploit success) scores 100%; indirect evidence (version banner match) scores 50%; inference only scores 20% |
| Independent confirmation | 25% | Confirmed by 2+ independent methods scores 100%; single method scores 50% |
| Environmental factors | 20% | Default configuration scores 100%; custom/hardened environment scores 60% |
| Historical accuracy | 15% | Platform's historical true positive rate for this vulnerability class |
| Recency | 10% | Evidence gathered within 1 hour scores 100%; decays to 50% after 24 hours |
Confidence = Sum(factor_score * weight)
Thresholds: Confirmed (>= 90%), High Confidence (70-89%), Medium Confidence (50-69%), Low Confidence (< 50%).
Findings below 50% confidence SHOULD be flagged as "Unconfirmed" and excluded from the executive summary unless the customer requests full disclosure.
Implementation: Maintain a cryptographically-signed chain linking each finding to its discovery method, tool output, and operator actions. Use timestamped logs with digital signatures for forensic accountability.
Key Considerations:
Common Pitfalls:
Implementation Aid: See the Evidence Package Manifest appendix for an illustrative machine-readable structure that links one finding to its raw artifacts, provenance events, and review state.
Implementation: Bind all evidence to the discovery chain using SHA-256 hashing of raw artifacts. Provide hash values in the report with instructions for client-side verification against original evidence files.
Key Considerations:
Common Pitfalls:
Implementation: Document and disclose the methodology for identifying and filtering false positives. Include per-severity false positive rates measured across the assessment (for example, "2 of 50 Medium findings verified as false positives").
Key Considerations:
Common Pitfalls:
Implementation: Designate reviewers independent from the automated tool chain to manually reproduce a statistically significant sample of Critical findings (minimum 80% coverage).
Key Considerations:
Common Pitfalls:
Implementation: Provide a coverage matrix mapping tested vulnerability classes to CWE/OWASP, clearly marking tested/excluded/partially-tested areas. Explain scope limitations and why certain vectors were excluded.
Key Considerations:
Common Pitfalls:
Implementation: Document the methodology used to assess false negative risk. Publish per-vulnerability-class FN rate estimates based on control gaps, tool detection limits, and known blind spots.
Key Considerations:
Common Pitfalls:
Implementation: Conduct quarterly benchmarks against controlled vulnerable environments (for example, DVWA, WebGoat) to validate detection accuracy. Document and trend results across tool versions and assessment methodologies.
Key Considerations:
Common Pitfalls:
Implementation: Provide a non-technical executive summary covering risk posture, overall findings distribution, coverage achieved, and key remediation priorities. Include context for risk-based decision making without requiring technical expertise.
Key Considerations:
Common Pitfalls:
Implementation: For each finding, provide step-by-step remediation guidance mapped to effort categories (quick-fix, short-term, long-term). Prioritize findings by risk and implementability to guide resource allocation.
Key Considerations:
Common Pitfalls:
Implementation: Document engagement timeline, any interruptions or scope changes, percentage of planned scope tested, and areas left untested or partially tested with reasons documented.
Key Considerations:
Common Pitfalls:
Implementation: For repeat clients, compare findings across engagements to identify new vulnerabilities, resolved issues, and persistent findings. Include trend metrics (closure rate, mean time to remediation, recurrence rate).
Key Considerations:
Common Pitfalls:
Implementation: Maintain data fidelity through all post-assessment processes: deduplication logic, tenant isolation in multi-client environments, delivery tracking, and audit logs. Implement cryptographic signing for final deliverables.
Key Considerations:
Common Pitfalls:
This section provides implementation guidance for the advisory practice APTS-RP-A01. It is not required for conformance at any tier.
Implementation: Deploy an independent verification step that screens every finding for fabricated evidence, hallucinated vulnerabilities, and severity misclassification before the finding enters the human review pipeline. The verification mechanism must not share context or state with the discovering agent.
Architecture Pattern: Independent Finding Judge
A proven pattern is to implement the verifier as a separate "Finding Judge" that receives only the finding record, associated evidence artifacts (PoC scripts, HTTP request/response pairs, tool output), and the target context. The judge evaluates each finding against several checks:
The judge classifies each finding as VERIFIED, FLAGGED, or REJECTED, with a structured log entry explaining the decision.
Calibrated Confidence Thresholds:
For finding types where evidence quality varies, implement calibrated confidence ceilings. For example, email injection findings without rendering verification (confirming the injected content is actually rendered by an email client) should be assigned a confidence ceiling below the "Confirmed" threshold, ensuring they are flagged for human review regardless of other evidence quality.
Key Considerations:
Common Pitfalls:
Tier 1 (implement before any autonomous pentesting begins): RP-006 (false positive rate disclosure), RP-008 (vulnerability coverage disclosure), RP-011 (executive summary and risk overview).
Start with RP-011 (executive summary) and RP-008 (coverage disclosure). Customers need these in every report. Add RP-006 (false positive disclosure) to establish trust in findings.
Tier 2 (implement within first 3 engagements): RP-001 (evidence-based finding validation), RP-002 (automated reproduction of critical findings), RP-003 (confidence scoring), RP-004 (finding provenance chain), RP-005 (cryptographic evidence integrity), RP-009 (false negative rate disclosure), RP-012 (remediation guidance), RP-013 (engagement SLA compliance), RP-014 (trend analysis, SHOULD), RP-015 (downstream pipeline integrity, SHOULD).
Prioritize RP-001 and RP-002 (evidence and reproduction) first. Findings without evidence are worthless. Then add RP-003 and RP-004 (confidence scoring, provenance) for auditability, and RP-005 (cryptographic integrity) for tamper-proof evidence chains.
Tier 3 (implement based on assessment maturity): RP-007 (independent finding validation during assessment), RP-010 (detection effectiveness benchmarking, SHOULD).