security: hide defense mechanism from user-facing prompt display

Split system prompt and user message into public/private versions: - Private versions (sent to LLM): include delimiter tags, anti-injection instructions, and 'never reveal' directives - Public versions (shown to user via 'Show prompt'): clean prompt without any defense details, raw user text without tag wrappers The user never sees: - The ###### delimiter tags wrapping their input - The instruction to ignore embedded instructions - The instruction to never reveal the system prompt - The instruction not to acknowledge delimiter tags This prevents an attacker from learning the defense mechanism and crafting injections that work around it.
2026-04-12 23:42:31 -04:00
parent 96155fda36
commit 85dec4908f
4 changed files with 88 additions and 56 deletions
--- a/src/lib/components/LoadingModal.svelte
+++ b/src/lib/components/LoadingModal.svelte
@@ -126,6 +126,36 @@
        fuchsia: "#D63384",
        orange: "#F39C12",
        indigo: "#5B2C6F",
+        dustyrose: "#966464",
+        dustypink: "#966482",
+        dustypeach: "#966E5A",
+        dustycoral: "#96645A",
+        dustyblush: "#8C6E8C",
+        dustyviolet: "#786496",
+        dustylavender: "#826EA0",
+        dustyblue: "#6478A0",
+        dustyslate: "#6E788C",
+        dustysky: "#507896",
+        dustyteal: "#468282",
+        dustycyan: "#3C828C",
+        dustymint: "#50826E",
+        dustysage: "#5A825A",
+        dustygreen: "#508264",
+        dustyemerald: "#46826E",
+        dustyseafoam: "#468278",
+        dustyolive: "#6E8250",
+        dustylime: "#6E823C",
+        dustygold: "#8C7846",
+        dustyamber: "#966E46",
+        dustymustard: "#8C783C",
+        dustyyellow: "#82783C",
+        dustyorange: "#966446",
+        dustyclay: "#8C6450",
+        dustyterra: "#8C5A46",
+        dustywine: "#96646E",
+        dustyberry: "#96648C",
+        dustymagenta: "#965A82",
+        dustyplum: "#8C648C",
    };

    const animationStyles = [