Files
clawfort/openspec/changes/archive/2026-02-13-p03-languages-ml-tm/design.md

4.7 KiB

Context

ClawFort currently stores and serves article content in a single language flow. The news creation path fetches English content via Perplexity and persists one record per article, while frontend hero/feed rendering consumes that single-language payload.

This change introduces multilingual support for Tamil and Malayalam with language-aware rendering and persistent user preference.

Constraints:

  • Keep existing English behavior as default and fallback.
  • Reuse current Perplexity integration for translation generation.
  • Keep API and frontend changes minimal and backward-compatible where possible.
  • Persist user language preference client-side so returning users keep their choice.

Goals / Non-Goals

Goals:

  • Generate Tamil and Malayalam translations at article creation time.
  • Persist translation variants linked to the base article.
  • Serve language-specific content in hero/feed API responses.
  • Add landing-page language selector and persist preference across sessions.

Non-Goals:

  • Supporting arbitrary language expansion in this phase.
  • Introducing user accounts/server-side profile preferences.
  • Building editorial translation workflows or manual override UI.
  • Replacing Perplexity as translation provider.

Decisions

Decision: Model translations as child records linked to a base article

Decision: Keep one source article and store translation rows keyed by article ID + language code.

Rationale:

  • Avoids duplicating non-language metadata (source URL, image attribution, timestamps).
  • Supports language lookup with deterministic fallback to English.
  • Eases future language additions without schema redesign.

Alternatives considered:

  • Inline columns on article table (headline_ta, headline_ml): rejected as rigid and harder to extend.
  • Fully duplicated article rows per language: rejected due to dedup and feed-order complexity.

Decision: Translate immediately after article creation in ingestion pipeline

Decision: For each newly accepted article, request Tamil and Malayalam translations and persist before ingestion cycle completes.

Rationale:

  • Keeps article and translations synchronized.
  • Avoids delayed jobs and partial language availability in normal flow.
  • Fits existing per-article processing loop.

Alternatives considered:

  • Asynchronous background translation queue: rejected for higher complexity in this phase.

Decision: Add optional language input to read APIs with English fallback

Decision: Add language selection input (query param) on existing read endpoints; if translation missing, return English source text.

Rationale:

  • Preserves endpoint footprint and frontend integration simplicity.
  • Guarantees response completeness even when translation fails.
  • Supports progressive rollout without breaking existing consumers.

Alternatives considered:

  • New language-specific endpoints: rejected as unnecessary API surface growth.

Decision: Primary persistence in localStorage; optional cookie fallback for constrained browsers.

Rationale:

  • Simple client-only persistence without backend session dependencies.
  • Matches one-page app architecture and current no-auth model.

Alternatives considered:

  • Cookie-only preference: rejected as less ergonomic for JS state hydration.

Risks / Trade-offs

  • [Risk] Translation generation increases API cost/latency per ingestion cycle -> Mitigation: bounded retries, fallback to English when translation unavailable.
  • [Risk] Partial translation failures create mixed-language feed -> Mitigation: deterministic fallback to English for missing translation rows.
  • [Trade-off] Translation-at-ingest adds synchronous processing time -> Mitigation: keep language set fixed to two targets in this phase.
  • [Risk] Language preference desynchronization between tabs/devices -> Mitigation: accept per-browser persistence scope in current architecture.

Migration Plan

  1. Add translation persistence model and migration path.
  2. Extend ingestion pipeline to request/store Tamil and Malayalam translations.
  3. Add language-aware API response behavior with fallback.
  4. Implement frontend language selector + preference persistence.
  5. Validate language switching, fallback, and returning-user preference behavior.

Rollback:

  • Disable language selection in frontend and return English-only payload while retaining translation data safely.

Open Questions

  • Should translation failures be retried independently per language within the same cycle, or skipped after one failed language call?
  • Should unsupported language requests return 400 or silently fallback to English in v1?