Files
clawfort/openspec/changes/archive/2026-02-13-p03-languages-ml-tm/design.md

95 lines
4.7 KiB
Markdown

## Context
ClawFort currently stores and serves article content in a single language flow. The news creation path fetches English content via Perplexity and persists one record per article, while frontend hero/feed rendering consumes that single-language payload.
This change introduces multilingual support for Tamil and Malayalam with language-aware rendering and persistent user preference.
Constraints:
- Keep existing English behavior as default and fallback.
- Reuse current Perplexity integration for translation generation.
- Keep API and frontend changes minimal and backward-compatible where possible.
- Persist user language preference client-side so returning users keep their choice.
## Goals / Non-Goals
**Goals:**
- Generate Tamil and Malayalam translations at article creation time.
- Persist translation variants linked to the base article.
- Serve language-specific content in hero/feed API responses.
- Add landing-page language selector and persist preference across sessions.
**Non-Goals:**
- Supporting arbitrary language expansion in this phase.
- Introducing user accounts/server-side profile preferences.
- Building editorial translation workflows or manual override UI.
- Replacing Perplexity as translation provider.
## Decisions
### Decision: Model translations as child records linked to a base article
**Decision:** Keep one source article and store translation rows keyed by article ID + language code.
**Rationale:**
- Avoids duplicating non-language metadata (source URL, image attribution, timestamps).
- Supports language lookup with deterministic fallback to English.
- Eases future language additions without schema redesign.
**Alternatives considered:**
- Inline columns on article table (`headline_ta`, `headline_ml`): rejected as rigid and harder to extend.
- Fully duplicated article rows per language: rejected due to dedup and feed-order complexity.
### Decision: Translate immediately after article creation in ingestion pipeline
**Decision:** For each newly accepted article, request Tamil and Malayalam translations and persist before ingestion cycle completes.
**Rationale:**
- Keeps article and translations synchronized.
- Avoids delayed jobs and partial language availability in normal flow.
- Fits existing per-article processing loop.
**Alternatives considered:**
- Asynchronous background translation queue: rejected for higher complexity in this phase.
### Decision: Add optional language input to read APIs with English fallback
**Decision:** Add language selection input (query param) on existing read endpoints; if translation missing, return English source text.
**Rationale:**
- Preserves endpoint footprint and frontend integration simplicity.
- Guarantees response completeness even when translation fails.
- Supports progressive rollout without breaking existing consumers.
**Alternatives considered:**
- New language-specific endpoints: rejected as unnecessary API surface growth.
### Decision: Persist frontend language preference in localStorage with cookie fallback
**Decision:** Primary persistence in `localStorage`; optional cookie fallback for constrained browsers.
**Rationale:**
- Simple client-only persistence without backend session dependencies.
- Matches one-page app architecture and current no-auth model.
**Alternatives considered:**
- Cookie-only preference: rejected as less ergonomic for JS state hydration.
## Risks / Trade-offs
- **[Risk] Translation generation increases API cost/latency per ingestion cycle** -> Mitigation: bounded retries, fallback to English when translation unavailable.
- **[Risk] Partial translation failures create mixed-language feed** -> Mitigation: deterministic fallback to English for missing translation rows.
- **[Trade-off] Translation-at-ingest adds synchronous processing time** -> Mitigation: keep language set fixed to two targets in this phase.
- **[Risk] Language preference desynchronization between tabs/devices** -> Mitigation: accept per-browser persistence scope in current architecture.
## Migration Plan
1. Add translation persistence model and migration path.
2. Extend ingestion pipeline to request/store Tamil and Malayalam translations.
3. Add language-aware API response behavior with fallback.
4. Implement frontend language selector + preference persistence.
5. Validate language switching, fallback, and returning-user preference behavior.
Rollback:
- Disable language selection in frontend and return English-only payload while retaining translation data safely.
## Open Questions
- Should translation failures be retried independently per language within the same cycle, or skipped after one failed language call?
- Should unsupported language requests return 400 or silently fallback to English in v1?