## Purpose Canonical specification for image-query-refinement requirements synced from OpenSpec change deltas. ## Requirements ### Requirement: Keyword extraction from headline The system SHALL extract relevant keywords from article headlines for image search. #### Scenario: Extract keywords from standard headline - **WHEN** headline is "OpenAI Announces GPT-5 with Revolutionary Reasoning Capabilities" - **THEN** extracted query is "OpenAI GPT-5 Revolutionary Reasoning" - **AND** stop words like "Announces", "with", "Capabilities" are removed #### Scenario: Handle short headline - **WHEN** headline is "AI Breakthrough" - **THEN** extracted query is "AI Breakthrough" - **AND** no keywords are removed (headline too short) #### Scenario: Handle headline with special characters - **WHEN** headline is "Tesla's Self-Driving AI: 99.9% Accuracy Achieved!" - **THEN** extracted query is "Tesla Self-Driving AI Accuracy" - **AND** special characters like apostrophes, colons, and punctuation are normalized ### Requirement: Stop word removal The system SHALL remove common English stop words from search queries. #### Scenario: Remove articles and prepositions - **WHEN** headline is "The Future of AI in the Healthcare Industry" - **THEN** extracted query is "Future AI Healthcare Industry" - **AND** "The", "of", "in", "the" are removed #### Scenario: Preserve technical terms - **WHEN** headline is "How Machine Learning Models Learn from Data" - **THEN** extracted query is "Machine Learning Models Learn Data" - **AND** technical terms "Machine", "Learning", "Models" are preserved ### Requirement: Query length limit The system SHALL limit search query length to optimize API results. #### Scenario: Truncate long query - **WHEN** extracted keywords exceed 10 words - **THEN** query is limited to first 5 most significant keywords - **AND** remaining keywords are dropped #### Scenario: Preserve short query - **WHEN** extracted keywords are 5 words or fewer - **THEN** all keywords are included in query - **AND** no truncation occurs ### Requirement: URL-safe query encoding The system SHALL URL-encode queries before sending to provider APIs. #### Scenario: Encode spaces and special characters - **WHEN** query is "AI Machine Learning" - **THEN** encoded query is "AI+Machine+Learning" or "AI%20Machine%20Learning" - **AND** query is safe for HTTP GET parameters #### Scenario: Handle Unicode characters - **WHEN** query contains Unicode like "AI für Deutschland" - **THEN** Unicode characters are properly percent-encoded - **AND** API request succeeds without encoding errors ### Requirement: Empty query handling The system SHALL handle edge cases where no keywords can be extracted. #### Scenario: Headline with only stop words - **WHEN** headline is "The and a or but" - **THEN** system uses fallback query "news technology" - **AND** image search proceeds with generic query #### Scenario: Empty headline - **WHEN** headline is empty string or whitespace only - **THEN** system uses fallback query "news technology" - **AND** image search proceeds with generic query