65 lines
2.0 KiB
Markdown
65 lines
2.0 KiB
Markdown
# Quality and Monitoring Baseline
|
|
|
|
## CI Quality Gates
|
|
|
|
Pipeline file: `.github/workflows/quality-gates.yml`
|
|
|
|
Stages:
|
|
- `lint-and-test`: Ruff + pytest (coverage threshold enforced).
|
|
- `security-scan`: `pip-audit` dependency vulnerability scan.
|
|
|
|
Failure policy:
|
|
- Any failed stage blocks merge.
|
|
- Coverage floor below threshold blocks merge.
|
|
|
|
## Coverage and Test Scope
|
|
|
|
Current baseline suites:
|
|
- API contracts: `tests/test_api_contracts.py`
|
|
- DB lifecycle workflows: `tests/test_db_workflows.py`
|
|
- Accessibility contracts: `tests/test_accessibility_contract.py`
|
|
- Security/performance smoke checks: `tests/test_security_and_performance.py`
|
|
|
|
## UX Validation Checklist
|
|
|
|
Run manually on desktop + mobile viewport:
|
|
1. Hero loads with image and CTA visible.
|
|
2. Feed cards render with source and TL;DR CTA.
|
|
3. Modal opens/closes with Escape and backdrop click.
|
|
4. Share controls are visible in light and dark themes.
|
|
5. Floating back-to-top appears after scrolling and returns to top.
|
|
|
|
## Production Metrics and Alert Thresholds
|
|
|
|
| Metric | Target | Alert Threshold |
|
|
|---|---|---|
|
|
| API p95 latency (`/api/news`) | < 350 ms | > 750 ms for 10 min |
|
|
| API error rate (`5xx`) | < 1% | > 3% for 5 min |
|
|
| Scheduler success rate | 100% hourly runs | 2 consecutive failures |
|
|
| Feed freshness lag | < 75 min | > 120 min |
|
|
|
|
## Alert Runbook
|
|
|
|
### Incident: Elevated API latency
|
|
1. Confirm DB file I/O and host CPU saturation.
|
|
2. Inspect recent release diff for expensive queries.
|
|
3. Roll back latest deploy if regression is confirmed.
|
|
|
|
### Incident: Scheduler failures
|
|
1. Check API key and upstream provider status.
|
|
2. Run `python -m backend.cli force-fetch` for repro.
|
|
3. Review logs for provider fallback exhaustion.
|
|
|
|
### Incident: Error-rate spike
|
|
1. Check `/api/health` response and DB availability.
|
|
2. Identify top failing routes and common status codes.
|
|
3. Mitigate with rollback or feature flag disablement.
|
|
|
|
## Review/Remediation Log Template
|
|
|
|
Use this structure for each cycle:
|
|
|
|
```text
|
|
severity=<high|medium|low> owner=<name> area=<frontend|backend|infra> finding=<summary> status=<open|fixed>
|
|
```
|