p08-seo-tweaks

This commit is contained in:
2026-02-13 00:49:22 -05:00
parent a1da041f14
commit 88a5540b7d
63 changed files with 2228 additions and 37 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-13

View File

@@ -0,0 +1,91 @@
## Context
ClawFort serves a static, client-rendered news experience backed by FastAPI endpoints and scheduled content refresh. The change introduces explicit technical requirements for crawlability, structured data quality, and delivery speed so SEO behavior is reliable across homepage, article cards, and static policy pages.
Current implementation already includes foundational metadata and partial performance behavior, but requirements are not yet codified in change specs. This design defines an implementation approach that keeps existing architecture (FastAPI + static frontend) while formalizing output guarantees required by search engines and validators.
## Goals / Non-Goals
**Goals:**
- Define a deterministic metadata contract for homepage and static pages (description, canonical, robots, Open Graph, Twitter card fields).
- Define structured-data output for homepage (`Newspaper`) and every rendered news item (`NewsArticle`) with stable required properties.
- Define response-delivery expectations for compression and cache policy plus front-end media loading behavior.
- Keep requirements implementable in the current stack without introducing heavyweight infrastructure.
**Non-Goals:**
- Full SSR migration or framework replacement.
- Introduction of external CDN, edge workers, or managed caching tiers.
- Reworking editorial/news-fetch business logic.
- Rich-result optimization for types outside this scope (e.g., FAQ, VideoObject, LiveBlogPosting).
## Decisions
### Decision: Keep JSON-LD generation in the existing page runtime contract
**Decision:** Define structured data as JSON-LD embedded in `index.html`, populated from the same article data model used by hero/feed rendering.
**Rationale:**
- Avoids duplication between UI content and structured data.
- Preserves current architecture and deployment flow.
- Supports homepage-wide `@graph` output containing `Newspaper` and multiple `NewsArticle` nodes.
**Alternatives considered:**
- Server-side rendered JSON-LD via template engine: rejected due to architectural drift and migration overhead.
- Microdata-only tagging: rejected because JSON-LD is simpler to maintain and validate for this use case.
### Decision: Use standards-aligned required field baseline for `NewsArticle`
**Decision:** Require each `NewsArticle` node to include stable core fields: headline, description, image, datePublished, dateModified, url/mainEntityOfPage, inLanguage, publisher, and author.
**Rationale:**
- Produces predictable, testable output.
- Reduces schema validation regressions from partial payloads.
- Aligns with common crawler expectations for article entities.
**Alternatives considered:**
- Minimal schema with only headline/url: rejected due to weak semantic value and poorer validation confidence.
### Decision: Enforce lightweight HTTP performance controls in-app
**Decision:** Treat transport optimization as explicit requirements using in-app compression middleware and response cache headers by route class (static assets, APIs, HTML pages).
**Rationale:**
- High impact with minimal infrastructure changes.
- Testable directly in integration checks.
- Works in current deployment topology.
**Alternatives considered:**
- Delegate entirely to reverse proxy/CDN: rejected because this repository currently controls delivery behavior directly.
### Decision: Standardize lazy media loading behavior with shimmer placeholders
**Decision:** Define lazy-loading requirements for non-critical images and require shimmer placeholder states until image load/error resolution.
**Rationale:**
- Improves perceived performance and consistency.
- Helps reduce layout instability when paired with explicit image dimensions.
- Fits existing UI loading pattern.
**Alternatives considered:**
- Skeleton-only page-level placeholders: rejected because item-level shimmer provides better visual continuity.
## Risks / Trade-offs
- **[Risk] Dynamic metadata timing for client-rendered content** -> Mitigation: require baseline static metadata defaults and deterministic runtime replacement after hero/article payload availability.
- **[Risk] Overly aggressive cache behavior could stale fresh news** -> Mitigation: short API max-age with stale-while-revalidate; separate longer static asset policy.
- **[Trade-off] Strict validation vs. framework directives in markup** -> Mitigation: define standards-compatible output goals and track exceptions where framework attributes are unavoidable.
- **[Trade-off] More metadata fields increase maintenance** -> Mitigation: centralize field mapping helpers and require parity with article model fields.
## Migration Plan
1. Implement and verify metadata/structured-data contracts on homepage and news-card rendering paths.
2. Add/verify response compression and route-level cache directives in backend delivery layer.
3. Align image loading UX requirements (lazy + shimmer + explicit dimensions) across hero/feed/modal contexts.
4. Validate output with schema and HTML validation tooling, then fix conformance gaps.
5. Document acceptance checks and rollback approach.
Rollback:
- Revert SEO/performance-specific frontend/backend changes to prior baseline while retaining unaffected feature behavior.
- Remove schema additions and route cache directives if they introduce regressions.
## Open Questions
- Should policy pages (`/terms`, `/attribution`) share a stricter noindex strategy or remain indexable by default?
- Should canonical URLs include hash anchors for in-page article cards or stay route-level canonical only?
- Do we require locale-specific `og:locale`/alternate tags in this phase or defer to a follow-up i18n SEO change?

View File

@@ -0,0 +1,28 @@
## Why
ClawFort currently lacks a formal SEO and structured-data specification, which limits discoverability and consistency for search crawlers. Defining this now ensures the news experience is indexable, standards-oriented, and performance-focused as the content footprint grows.
## What Changes
- Add search-focused metadata requirements for the main page and policy pages (description, canonical, robots, social preview tags).
- Define structured data requirements so the home page is represented as `Newspaper` and each news item is represented as `NewsArticle`.
- Establish performance requirements for transport and caching behavior (HTTP compression and cache directives) plus front-end loading behavior.
- Define UX and rendering requirements for image lazy loading with shimmer placeholders and smooth scrolling.
- Require markup and interaction patterns that are compatible with strict standards validation goals.
## Capabilities
### New Capabilities
- `seo-meta-and-social-tags`: Standardize meta, canonical, robots, and social preview tags for key public pages.
- `news-structured-data`: Provide machine-readable `Newspaper` and `NewsArticle` structured data for homepage and article entries.
- `delivery-and-rendering-performance`: Define response compression/caching and client-side loading behavior for faster page delivery.
### Modified Capabilities
- None.
## Impact
- **Frontend/UI:** `frontend/index.html` and static policy pages gain SEO metadata, structured-data hooks, and loading-state behavior requirements.
- **Backend/API Delivery:** `backend/main.py` response middleware/headers are affected by compression and cache policy expectations.
- **Quality/Validation:** Standards conformance and SEO validation become explicit acceptance criteria for this change.
- **Operations:** Performance posture depends on HTTP behavior and deploy/runtime configuration alignment.

View File

@@ -0,0 +1,35 @@
## ADDED Requirements
### Requirement: HTTP delivery applies compression and cache policy
The system SHALL apply transport-level compression and explicit cache directives for static assets, API responses, and public HTML routes.
#### Scenario: Compressed responses are available for eligible payloads
- **WHEN** a client requests compressible content that exceeds the compression threshold
- **THEN** the response is served with gzip compression
- **AND** response headers advertise the selected content encoding
#### Scenario: Route classes receive deterministic cache-control directives
- **WHEN** clients request static assets, API responses, or HTML page routes
- **THEN** each route class returns a cache policy aligned to its freshness requirements
- **AND** cache directives are explicit and testable from response headers
### Requirement: Media rendering optimizes perceived loading performance
The system SHALL lazy-load non-critical images and render shimmer placeholders until image load completion or fallback resolution.
#### Scenario: Feed and modal images lazy-load with placeholders
- **WHEN** feed or modal images have not completed loading
- **THEN** a shimmer placeholder is visible for the pending image region
- **AND** the placeholder is removed after load or fallback error handling completes
#### Scenario: Image rendering reduces layout shift risk
- **WHEN** article images are rendered in hero, feed, or modal contexts
- **THEN** image elements include explicit dimensions and async decoding hints
- **AND** layout remains stable while content loads
### Requirement: Smooth scrolling behavior is consistently enabled
The system SHALL provide smooth scrolling behavior for in-page navigation and user-initiated scroll interactions.
#### Scenario: In-page navigation uses smooth scrolling
- **WHEN** users navigate to in-page anchors or equivalent interactions
- **THEN** scrolling transitions occur smoothly rather than jumping abruptly
- **AND** behavior is consistent across supported breakpoints

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Homepage publishes Newspaper structured data
The system SHALL expose a valid JSON-LD entity of type `Newspaper` on the homepage.
#### Scenario: Newspaper entity is emitted on homepage
- **WHEN** the homepage HTML is rendered
- **THEN** a JSON-LD script block includes an entity with `@type` set to `Newspaper`
- **AND** the entity includes stable publisher and site identity fields
#### Scenario: Newspaper entity remains language-aware
- **WHEN** homepage content is rendered in a selected language
- **THEN** the structured data includes language context for the active locale
- **AND** language output stays consistent with visible content language
### Requirement: Each rendered news item publishes NewsArticle structured data
The system SHALL expose a valid JSON-LD entity of type `NewsArticle` for each rendered news item in hero and feed contexts.
#### Scenario: NewsArticle entities include required semantic fields
- **WHEN** news items are present on the homepage
- **THEN** each `NewsArticle` entity includes headline, description, image, publication dates, and URL fields
- **AND** publisher and author context are present for each item
#### Scenario: Structured data avoids duplicate article entities
- **WHEN** article data appears across hero and feed sections
- **THEN** structured-data output deduplicates entities for the same article URL
- **AND** only one canonical semantic entry remains for each unique article

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Core SEO metadata is present on public pages
The system SHALL expose standards-compliant SEO metadata on the homepage and policy pages, including description, robots, canonical URL, and social preview metadata.
#### Scenario: Homepage metadata baseline exists
- **WHEN** a crawler or browser loads the homepage
- **THEN** the document includes `description`, `robots`, and canonical metadata
- **AND** Open Graph and Twitter card metadata fields are present with non-empty values
#### Scenario: Policy pages include indexable metadata
- **WHEN** a crawler loads `/terms` or `/attribution`
- **THEN** the page includes page-specific `title` and `description` metadata
- **AND** Open Graph and Twitter card metadata are present for link previews
### Requirement: Canonical and preview metadata remain deterministic
The system SHALL keep canonical and preview metadata deterministic for each route to avoid conflicting crawler signals.
#### Scenario: Canonical URL reflects active route
- **WHEN** metadata is rendered for a public route
- **THEN** exactly one canonical link is emitted for that route
- **AND** canonical metadata does not point to unrelated routes
#### Scenario: Social preview tags map to current page context
- **WHEN** the page metadata is generated or updated
- **THEN** `og:title`, `og:description`, and corresponding Twitter fields reflect the current page context
- **AND** preview image fields resolve to a valid absolute URL

View File

@@ -0,0 +1,34 @@
## 1. SEO Metadata and Social Tags
- [x] 1.1 Ensure homepage and policy pages expose required `title`, `description`, `robots`, and canonical metadata.
- [x] 1.2 Ensure Open Graph and Twitter metadata fields are present and mapped to current page context.
- [x] 1.3 Add verification checks for deterministic canonical URLs and valid absolute social image URLs.
## 2. Structured Data (Newspaper and NewsArticle)
- [x] 2.1 Implement and verify homepage `Newspaper` JSON-LD output with publisher/site identity fields.
- [x] 2.2 Implement and verify `NewsArticle` JSON-LD output for hero and feed items using required semantic fields.
- [x] 2.3 Add deduplication logic so repeated hero/feed references emit one semantic entity per article URL.
## 3. Delivery and Caching Performance
- [x] 3.1 Apply and validate gzip compression for eligible responses.
- [x] 3.2 Apply and validate explicit cache-control policies for static assets, APIs, and HTML routes.
- [x] 3.3 Verify route-level header behavior with repeatable checks and document expected header values.
## 4. Rendering Performance and UX
- [x] 4.1 Ensure non-critical images use lazy loading with explicit dimensions and async decoding hints.
- [x] 4.2 Ensure shimmer placeholders are visible until image load or fallback completion in feed and modal contexts.
- [x] 4.3 Ensure smooth scrolling behavior remains consistent for in-page navigation interactions.
## 5. Validation and Acceptance
- [x] 5.1 Validate structured data output for `Newspaper` and `NewsArticle` entities against schema expectations.
- [x] 5.2 Validate HTML/metadata output against project validation goals and resolve conformance gaps.
- [x] 5.3 Execute regression checks for homepage rendering, article card behavior, and policy page metadata.
## 6. Documentation
- [x] 6.1 Document SEO/structured-data contracts and performance header expectations in project docs.
- [x] 6.2 Document verification steps so future changes can re-run SEO and performance acceptance checks.