742 B
742 B
ADDED Requirements
Requirement: Production monitoring covers key reliability signals
The system SHALL capture and expose reliability/performance metrics for core services.
Scenario: Metrics available for operations
- WHEN production system is running
- THEN dashboards expose API latency/error rate, scheduler freshness, and ingestion health signals
Requirement: Alerting is actionable and threshold-based
The system SHALL send alerts on defined thresholds with clear operator guidance.
Scenario: Threshold breach alert
- WHEN a monitored metric breaches configured threshold
- THEN alert is emitted to configured channel
- AND alert includes service, metric, threshold, and suggested next action