## ADDED Requirements ### Requirement: Normalized content items The system MUST normalize all ingested items (YouTube videos, Instagram posts, podcast episodes) into a single internal schema so the website can render them consistently. The normalized item MUST include at minimum: - `id` (stable within its source) - `source` (`youtube`, `instagram`, or `podcast`) - `url` - `title` - `publishedAt` (ISO-8601) - `thumbnailUrl` (optional) The system MUST support an optional summary field on normalized items when available from the source: - `summary` (optional, short human-readable excerpt suitable for cards) #### Scenario: Normalizing a YouTube video - **WHEN** the system ingests a YouTube video item - **THEN** it produces a normalized item containing `id`, `source: youtube`, `url`, `title`, and `publishedAt` #### Scenario: Normalizing a podcast episode - **WHEN** the system ingests a podcast RSS episode - **THEN** it produces a normalized item containing `id`, `source: podcast`, `url`, `title`, and `publishedAt` #### Scenario: Summary available - **WHEN** an ingested item provides summary/description content - **THEN** the normalized item includes a `summary` suitable for rendering in cards ### Requirement: YouTube ingestion with stats when available The system MUST support ingesting YouTube videos for channel `youtube.com/santhoshj`. When a YouTube API key is configured, the system MUST ingest video metadata and MUST ingest view count (and MAY ingest likes/comments if available) so "high-performing" can be computed. When no YouTube API key is configured, the system MUST still ingest latest videos using a non-authenticated mechanism (for example, channel RSS) but MUST omit performance stats. #### Scenario: API key configured - **WHEN** a YouTube API key is configured - **THEN** the system ingests video metadata and includes `metrics.views` for each ingested video when available from the API #### Scenario: No API key configured - **WHEN** no YouTube API key is configured - **THEN** the system ingests latest videos and does not require `metrics.views` to be present ### Requirement: Podcast RSS ingestion The system MUST ingest the Irregular Mind podcast RSS feed and produce normalized items representing podcast episodes. #### Scenario: RSS feed fetch succeeds - **WHEN** the system fetches the podcast RSS feed successfully - **THEN** it produces one normalized item per episode with `source: podcast` ### Requirement: Instagram content support via embed-first approach The system MUST support representing Instagram posts for `@santhoshjanan` in the site content surface. If API-based ingestion is not configured/available, the system MUST support an embed-first representation where the normalized item contains a `url` to the Instagram post and any additional embed metadata needed by the renderer. #### Scenario: Embed-first mode - **WHEN** Instagram API ingestion is not configured - **THEN** the system provides normalized Instagram items that contain a public post `url` suitable for embedding ### Requirement: Refresh and caching The system MUST cache the latest successful ingestion output and MUST serve the cached data to the site renderer. The system MUST support periodic refresh on a schedule (at minimum daily) and MUST support a manual refresh trigger. On ingestion failure, the system MUST continue serving the most recent cached data. The ingestion pipeline MUST use the cache layer (when configured and reachable) to reduce repeated network and parsing work for external sources (for example, YouTube API/RSS and podcast RSS). #### Scenario: Scheduled refresh fails - **WHEN** a scheduled refresh run fails to fetch one or more sources - **THEN** the site continues to use the most recent successfully cached dataset #### Scenario: Manual refresh requested - **WHEN** a manual refresh is triggered - **THEN** the system attempts ingestion immediately and updates the cache if ingestion succeeds #### Scenario: Cache hit avoids refetch - **WHEN** a refresh run is executed within the cache TTL for a given source+parameters - **THEN** the ingestion pipeline uses cached data for that source instead of refetching over the network