Skip to content

Architecture

Scope

This document defines the v1 architecture for the Diff Verification + Injection Scan microservice.

Services

Current v1 runtime is a two-process deployment: - api: FastAPI app for scan ingestion, governance APIs, webhook ingress, and admin endpoints. - worker: polling worker that claims queued jobs, executes verification + injection analysis, writes artifacts, and updates GitHub checks.

Logical responsibilities are still separated in code modules (policy, verification, rules, github_checks, artifacts) but not deployed as independent services.

Deferred from v1 scope: - billing-meter and other monetization microservices.

Storage

  • Postgres:
  • tenants, users, api_keys
  • scans, scan_jobs, findings
  • policies, policy_exceptions
  • audit_events
  • webhook_deliveries (GitHub replay/dedup tracking)
  • diff_blobs (raw diff object-storage references + deletion markers)
  • Object storage:
  • raw diffs (diffs/{scan_id}.patch)
  • signed artifacts (artifacts/{scan_id}.json)
  • optional HTML artifact views (artifacts/{scan_id}.html)
  • Queue:
  • implemented in Postgres table scan_jobs
  • dead-letter state tracked as scan_jobs.status = dead_letter

Request Lifecycle

  1. POST /v1/scans arrives at api-gateway.
  2. API validates token and tenant access.
  3. API persists scan row (queued) and enqueues scan_jobs.
  4. Worker claims job and runs deterministic verification + injection rules.
  5. Worker evaluates findings against active policy profile and applies suppression/baseline controls.
  6. Worker stores signed result artifact and marks scan completed/error.
  7. Worker and API update GitHub check state when GitHub source is used.

Decision Model

  • pass: no blocking findings under current policy.
  • fail: one or more blocking findings.
  • error: internal failure; configurable fail-open/fail-closed at tenant level.

Isolation and Security

  • Tenant-scoped data access via tenant id in every table and query.
  • Configurable API auth with tenant and admin roles.
  • Workers run in ephemeral sandboxed containers with read-only filesystem.
  • No default outbound network from workers.
  • Artifacts signed via signer mode (deterministic, kms-hmac, aws-kms).
  • Production mode blocks deterministic signing operations.

Idempotency

POST /v1/scans supports Idempotency-Key header.

  • Key collision with same payload returns original scan record.
  • Key collision with different payload returns 409.

Initial Tech Choices

  • API: FastAPI (Python) or Node (NestJS) with OpenAPI-first contract.
  • Queue: Redis Streams or SQS.
  • DB: Postgres 15+.
  • Object store: S3-compatible.

Next Implementation Tasks

  • Validate cloud object storage bindings in staging/prod with backup/restore drills.
  • Validate aws-kms signing/verification path in staging smoke and production dry run.
  • Wire infra/k8s/prometheus-rules.diffver.yaml to cluster Prometheus and alert routing.
  • Add SLO dashboards (P50/P95 scan latency, queue depth, publish failures, retention outcomes).