Files
style/src/lib/components
Santhosh Janardhanan 85dec4908f security: hide defense mechanism from user-facing prompt display
Split system prompt and user message into public/private versions:
- Private versions (sent to LLM): include delimiter tags, anti-injection
  instructions, and 'never reveal' directives
- Public versions (shown to user via 'Show prompt'): clean prompt
  without any defense details, raw user text without tag wrappers

The user never sees:
- The ###### delimiter tags wrapping their input
- The instruction to ignore embedded instructions
- The instruction to never reveal the system prompt
- The instruction not to acknowledge delimiter tags

This prevents an attacker from learning the defense mechanism
and crafting injections that work around it.
2026-04-12 23:42:31 -04:00
..