First deployment
This commit is contained in:
26
docs/monitoring-dashboard-config.md
Normal file
26
docs/monitoring-dashboard-config.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Monitoring Dashboard Configuration
|
||||
|
||||
## Objective
|
||||
|
||||
Define baseline dashboards and alert thresholds for reliability and freshness checks.
|
||||
|
||||
## Dashboard Panels
|
||||
|
||||
1. API p95 latency for `/api/news` and `/api/news/latest`
|
||||
2. API error rate (`5xx`) by route
|
||||
3. Scheduler success/failure count per hour
|
||||
4. Feed freshness lag (minutes since latest published item)
|
||||
|
||||
## Alert Thresholds
|
||||
|
||||
- API latency alert: p95 > 750 ms for 10 minutes
|
||||
- API error-rate alert: `5xx` > 3% for 5 minutes
|
||||
- Scheduler alert: 2 consecutive failed fetch cycles
|
||||
- Freshness alert: latest item older than 120 minutes
|
||||
|
||||
## Test Trigger Plan
|
||||
|
||||
- Latency trigger: run stress test against `/api/news` with 50 concurrent requests in staging.
|
||||
- Error-rate trigger: simulate upstream timeout and confirm 5xx alert path.
|
||||
- Scheduler trigger: disable upstream API key in staging and verify consecutive failure alert.
|
||||
- Freshness trigger: pause scheduler for >120 minutes in staging and confirm lag alert.
|
||||
23
docs/p15-code-review-findings.md
Normal file
23
docs/p15-code-review-findings.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# P15 Code Review Findings
|
||||
|
||||
Date: 2026-02-13
|
||||
|
||||
## High
|
||||
|
||||
- owner=backend area=translations finding=Machine translation output is accepted without strict language validation in runtime flow, allowing occasional script mismatch/gibberish.
|
||||
|
||||
## Medium
|
||||
|
||||
- owner=frontend area=policy-disclosures finding=Terms and Attribution links previously required route navigation, reducing continuity and causing context loss.
|
||||
- owner=backend area=admin-cli finding=Image refetch previously lacked permalink-targeted repair mode, forcing broad batch operations.
|
||||
|
||||
## Low
|
||||
|
||||
- owner=frontend area=sharing finding=Text-based icon actions in compact surfaces reduced visual consistency on small screens.
|
||||
|
||||
## Remediation Status
|
||||
|
||||
- translations-quality-gate: fixed-in-progress
|
||||
- policy-modal-surface: fixed-in-progress
|
||||
- permalink-targeted-refetch: fixed-in-progress
|
||||
- icon-consistency: fixed-in-progress
|
||||
64
docs/quality-and-monitoring.md
Normal file
64
docs/quality-and-monitoring.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Quality and Monitoring Baseline
|
||||
|
||||
## CI Quality Gates
|
||||
|
||||
Pipeline file: `.github/workflows/quality-gates.yml`
|
||||
|
||||
Stages:
|
||||
- `lint-and-test`: Ruff + pytest (coverage threshold enforced).
|
||||
- `security-scan`: `pip-audit` dependency vulnerability scan.
|
||||
|
||||
Failure policy:
|
||||
- Any failed stage blocks merge.
|
||||
- Coverage floor below threshold blocks merge.
|
||||
|
||||
## Coverage and Test Scope
|
||||
|
||||
Current baseline suites:
|
||||
- API contracts: `tests/test_api_contracts.py`
|
||||
- DB lifecycle workflows: `tests/test_db_workflows.py`
|
||||
- Accessibility contracts: `tests/test_accessibility_contract.py`
|
||||
- Security/performance smoke checks: `tests/test_security_and_performance.py`
|
||||
|
||||
## UX Validation Checklist
|
||||
|
||||
Run manually on desktop + mobile viewport:
|
||||
1. Hero loads with image and CTA visible.
|
||||
2. Feed cards render with source and TL;DR CTA.
|
||||
3. Modal opens/closes with Escape and backdrop click.
|
||||
4. Share controls are visible in light and dark themes.
|
||||
5. Floating back-to-top appears after scrolling and returns to top.
|
||||
|
||||
## Production Metrics and Alert Thresholds
|
||||
|
||||
| Metric | Target | Alert Threshold |
|
||||
|---|---|---|
|
||||
| API p95 latency (`/api/news`) | < 350 ms | > 750 ms for 10 min |
|
||||
| API error rate (`5xx`) | < 1% | > 3% for 5 min |
|
||||
| Scheduler success rate | 100% hourly runs | 2 consecutive failures |
|
||||
| Feed freshness lag | < 75 min | > 120 min |
|
||||
|
||||
## Alert Runbook
|
||||
|
||||
### Incident: Elevated API latency
|
||||
1. Confirm DB file I/O and host CPU saturation.
|
||||
2. Inspect recent release diff for expensive queries.
|
||||
3. Roll back latest deploy if regression is confirmed.
|
||||
|
||||
### Incident: Scheduler failures
|
||||
1. Check API key and upstream provider status.
|
||||
2. Run `python -m backend.cli force-fetch` for repro.
|
||||
3. Review logs for provider fallback exhaustion.
|
||||
|
||||
### Incident: Error-rate spike
|
||||
1. Check `/api/health` response and DB availability.
|
||||
2. Identify top failing routes and common status codes.
|
||||
3. Mitigate with rollback or feature flag disablement.
|
||||
|
||||
## Review/Remediation Log Template
|
||||
|
||||
Use this structure for each cycle:
|
||||
|
||||
```text
|
||||
severity=<high|medium|low> owner=<name> area=<frontend|backend|infra> finding=<summary> status=<open|fixed>
|
||||
```
|
||||
Reference in New Issue
Block a user