Reporting: Implementation Guide

Practical guidance for implementing APTS Reporting requirements. Each section provides a brief implementation approach, key considerations, and common pitfalls.

Note: This guide is informative, not normative. Recommended defaults and example values are suggested starting points; the Reporting README contains the authoritative requirements. Where this guide and the README differ, the README governs.


APTS-RP-001: Evidence-Based Finding Validation

Implementation: Attach raw technical evidence (logs, screenshots, payloads, network traces) to every finding. Clearly demarcate any AI-generated summaries or analysis as distinct from raw evidence.

Key Considerations:

Common Pitfalls:


APTS-RP-002: Finding Verification and Human Review Pipeline

Implementation: Re-execute all Critical and High findings before report finalization using automated workflows. Tag each finding as "Confirmed" or "Unconfirmed" with reproduction timestamp and methodology.

Key Considerations:

Common Pitfalls:


APTS-RP-003: Confidence Scoring with Auditable Methodology

Implementation: Assign 0-100% confidence scores based on measurable factors (evidence completeness, reproduction success, tool consistency, manual verification). Document the scoring model used in the assessment methodology.

Key Considerations:

Common Pitfalls:

Confidence Scoring Methodology:

A recommended confidence scoring approach uses weighted factors:

Factor Weight Description
Evidence quality 30% Direct evidence (exploit success) scores 100%; indirect evidence (version banner match) scores 50%; inference only scores 20%
Independent confirmation 25% Confirmed by 2+ independent methods scores 100%; single method scores 50%
Environmental factors 20% Default configuration scores 100%; custom/hardened environment scores 60%
Historical accuracy 15% Platform's historical true positive rate for this vulnerability class
Recency 10% Evidence gathered within 1 hour scores 100%; decays to 50% after 24 hours

Confidence = Sum(factor_score * weight)

Thresholds: Confirmed (>= 90%), High Confidence (70-89%), Medium Confidence (50-69%), Low Confidence (< 50%).

Findings below 50% confidence SHOULD be flagged as "Unconfirmed" and excluded from the executive summary unless the customer requests full disclosure.


APTS-RP-004: Finding Provenance Chain

Implementation: Maintain a cryptographically-signed chain linking each finding to its discovery method, tool output, and operator actions. Use timestamped logs with digital signatures for forensic accountability.

Key Considerations:

Common Pitfalls:

Implementation Aid: See the Evidence Package Manifest appendix for an illustrative machine-readable structure that links one finding to its raw artifacts, provenance events, and review state.


APTS-RP-005: Cryptographic Evidence Chain Integrity

Implementation: Bind all evidence to the discovery chain using SHA-256 hashing of raw artifacts. Provide hash values in the report with instructions for client-side verification against original evidence files.

Key Considerations:

Common Pitfalls:


APTS-RP-006: False Positive Rate Disclosure

Implementation: Document and disclose the methodology for identifying and filtering false positives. Include per-severity false positive rates measured across the assessment (for example, "2 of 50 Medium findings verified as false positives").

Key Considerations:

Common Pitfalls:


APTS-RP-007: Independent Finding Reproducibility

Implementation: Designate reviewers independent from the automated tool chain to manually reproduce a statistically significant sample of Critical findings (minimum 80% coverage).

Key Considerations:

Common Pitfalls:


APTS-RP-008: Vulnerability Coverage Disclosure

Implementation: Provide a coverage matrix mapping tested vulnerability classes to CWE/OWASP, clearly marking tested/excluded/partially-tested areas. Explain scope limitations and why certain vectors were excluded.

Key Considerations:

Common Pitfalls:


APTS-RP-009: False Negative Rate Disclosure and Methodology

Implementation: Document the methodology used to assess false negative risk. Publish per-vulnerability-class FN rate estimates based on control gaps, tool detection limits, and known blind spots.

Key Considerations:

Common Pitfalls:


APTS-RP-010: Detection Effectiveness Benchmarking

Implementation: Conduct quarterly benchmarks against controlled vulnerable environments (for example, DVWA, WebGoat) to validate detection accuracy. Document and trend results across tool versions and assessment methodologies.

Key Considerations:

Common Pitfalls:


APTS-RP-011: Executive Summary and Risk Overview

Implementation: Provide a non-technical executive summary covering risk posture, overall findings distribution, coverage achieved, and key remediation priorities. Include context for risk-based decision making without requiring technical expertise.

Key Considerations:

Common Pitfalls:


APTS-RP-012: Remediation Guidance and Prioritization

Implementation: For each finding, provide step-by-step remediation guidance mapped to effort categories (quick-fix, short-term, long-term). Prioritize findings by risk and implementability to guide resource allocation.

Key Considerations:

Common Pitfalls:


APTS-RP-013: Engagement SLA Compliance Reporting

Implementation: Document engagement timeline, any interruptions or scope changes, percentage of planned scope tested, and areas left untested or partially tested with reasons documented.

Key Considerations:

Common Pitfalls:


APTS-RP-014: Trend Analysis for Recurring Engagements

Implementation: For repeat clients, compare findings across engagements to identify new vulnerabilities, resolved issues, and persistent findings. Include trend metrics (closure rate, mean time to remediation, recurrence rate).

Key Considerations:

Common Pitfalls:


APTS-RP-015: Downstream Finding Pipeline Integrity

Implementation: Maintain data fidelity through all post-assessment processes: deduplication logic, tenant isolation in multi-client environments, delivery tracking, and audit logs. Implement cryptographic signing for final deliverables.

Key Considerations:

Common Pitfalls:


Advisory Practice Implementation Guidance

APTS-RP-A01: Automated Finding Authenticity Verification

This section provides implementation guidance for the advisory practice APTS-RP-A01. It is not required for conformance at any tier.

Implementation: Deploy an independent verification step that screens every finding for fabricated evidence, hallucinated vulnerabilities, and severity misclassification before the finding enters the human review pipeline. The verification mechanism must not share context or state with the discovering agent.

Architecture Pattern: Independent Finding Judge

A proven pattern is to implement the verifier as a separate "Finding Judge" that receives only the finding record, associated evidence artifacts (PoC scripts, HTTP request/response pairs, tool output), and the target context. The judge evaluates each finding against several checks:

  1. PoC authenticity check: Static analysis of proof-of-concept scripts for hardcoded output, absence of network calls, and output strings that match the "evidence" verbatim without any actual target interaction.
  2. Evidence-claim consistency check: Cross-reference the claimed vulnerability type against the raw evidence. SQL injection claims need SQL injection indicators; XSS claims need evidence of script execution.
  3. Severity calibration check: Evaluate whether the evidence supports the assigned severity. A Critical finding backed only by an informational disclosure is a severity mismatch.
  4. Design-intent check: Flag findings that describe intended application behavior (public API keys designed for client-side use, CORS headers intentionally set for broad access, documented public endpoints).

The judge classifies each finding as VERIFIED, FLAGGED, or REJECTED, with a structured log entry explaining the decision.

Calibrated Confidence Thresholds:

For finding types where evidence quality varies, implement calibrated confidence ceilings. For example, email injection findings without rendering verification (confirming the injected content is actually rendered by an email client) should be assigned a confidence ceiling below the "Confirmed" threshold, ensuring they are flagged for human review regardless of other evidence quality.

Key Considerations:

Common Pitfalls:


Implementation Roadmap

Tier 1 (implement before any autonomous pentesting begins): RP-006 (false positive rate disclosure), RP-008 (vulnerability coverage disclosure), RP-011 (executive summary and risk overview).

Start with RP-011 (executive summary) and RP-008 (coverage disclosure). Customers need these in every report. Add RP-006 (false positive disclosure) to establish trust in findings.

Tier 2 (implement within first 3 engagements): RP-001 (evidence-based finding validation), RP-002 (automated reproduction of critical findings), RP-003 (confidence scoring), RP-004 (finding provenance chain), RP-005 (cryptographic evidence integrity), RP-009 (false negative rate disclosure), RP-012 (remediation guidance), RP-013 (engagement SLA compliance), RP-014 (trend analysis, SHOULD), RP-015 (downstream pipeline integrity, SHOULD).

Prioritize RP-001 and RP-002 (evidence and reproduction) first. Findings without evidence are worthless. Then add RP-003 and RP-004 (confidence scoring, provenance) for auditability, and RP-005 (cryptographic integrity) for tamper-proof evidence chains.

Tier 3 (implement based on assessment maturity): RP-007 (independent finding validation during assessment), RP-010 (detection effectiveness benchmarking, SHOULD).