Files
clawfort/docs/quality-and-monitoring.md
Santhosh Janardhanan 679561bcdb
Some checks failed
quality-gates / lint-and-test (push) Has been cancelled
quality-gates / security-scan (push) Has been cancelled
First deployment
2026-02-13 09:14:04 -05:00

2.0 KiB

Quality and Monitoring Baseline

CI Quality Gates

Pipeline file: .github/workflows/quality-gates.yml

Stages:

  • lint-and-test: Ruff + pytest (coverage threshold enforced).
  • security-scan: pip-audit dependency vulnerability scan.

Failure policy:

  • Any failed stage blocks merge.
  • Coverage floor below threshold blocks merge.

Coverage and Test Scope

Current baseline suites:

  • API contracts: tests/test_api_contracts.py
  • DB lifecycle workflows: tests/test_db_workflows.py
  • Accessibility contracts: tests/test_accessibility_contract.py
  • Security/performance smoke checks: tests/test_security_and_performance.py

UX Validation Checklist

Run manually on desktop + mobile viewport:

  1. Hero loads with image and CTA visible.
  2. Feed cards render with source and TL;DR CTA.
  3. Modal opens/closes with Escape and backdrop click.
  4. Share controls are visible in light and dark themes.
  5. Floating back-to-top appears after scrolling and returns to top.

Production Metrics and Alert Thresholds

Metric Target Alert Threshold
API p95 latency (/api/news) < 350 ms > 750 ms for 10 min
API error rate (5xx) < 1% > 3% for 5 min
Scheduler success rate 100% hourly runs 2 consecutive failures
Feed freshness lag < 75 min > 120 min

Alert Runbook

Incident: Elevated API latency

  1. Confirm DB file I/O and host CPU saturation.
  2. Inspect recent release diff for expensive queries.
  3. Roll back latest deploy if regression is confirmed.

Incident: Scheduler failures

  1. Check API key and upstream provider status.
  2. Run python -m backend.cli force-fetch for repro.
  3. Review logs for provider fallback exhaustion.

Incident: Error-rate spike

  1. Check /api/health response and DB availability.
  2. Identify top failing routes and common status codes.
  3. Mitigate with rollback or feature flag disablement.

Review/Remediation Log Template

Use this structure for each cycle:

severity=<high|medium|low> owner=<name> area=<frontend|backend|infra> finding=<summary> status=<open|fixed>