## ADDED Requirements ### Requirement: Production monitoring covers key reliability signals The system SHALL capture and expose reliability/performance metrics for core services. #### Scenario: Metrics available for operations - **WHEN** production system is running - **THEN** dashboards expose API latency/error rate, scheduler freshness, and ingestion health signals ### Requirement: Alerting is actionable and threshold-based The system SHALL send alerts on defined thresholds with clear operator guidance. #### Scenario: Threshold breach alert - **WHEN** a monitored metric breaches configured threshold - **THEN** alert is emitted to configured channel - **AND** alert includes service, metric, threshold, and suggested next action