security: hide defense mechanism from user-facing prompt display
Split system prompt and user message into public/private versions: - Private versions (sent to LLM): include delimiter tags, anti-injection instructions, and 'never reveal' directives - Public versions (shown to user via 'Show prompt'): clean prompt without any defense details, raw user text without tag wrappers The user never sees: - The ###### delimiter tags wrapping their input - The instruction to ignore embedded instructions - The instruction to never reveal the system prompt - The instruction not to acknowledge delimiter tags This prevents an attacker from learning the defense mechanism and crafting injections that work around it.
This commit is contained in:
@@ -126,6 +126,36 @@
|
||||
fuchsia: "#D63384",
|
||||
orange: "#F39C12",
|
||||
indigo: "#5B2C6F",
|
||||
dustyrose: "#966464",
|
||||
dustypink: "#966482",
|
||||
dustypeach: "#966E5A",
|
||||
dustycoral: "#96645A",
|
||||
dustyblush: "#8C6E8C",
|
||||
dustyviolet: "#786496",
|
||||
dustylavender: "#826EA0",
|
||||
dustyblue: "#6478A0",
|
||||
dustyslate: "#6E788C",
|
||||
dustysky: "#507896",
|
||||
dustyteal: "#468282",
|
||||
dustycyan: "#3C828C",
|
||||
dustymint: "#50826E",
|
||||
dustysage: "#5A825A",
|
||||
dustygreen: "#508264",
|
||||
dustyemerald: "#46826E",
|
||||
dustyseafoam: "#468278",
|
||||
dustyolive: "#6E8250",
|
||||
dustylime: "#6E823C",
|
||||
dustygold: "#8C7846",
|
||||
dustyamber: "#966E46",
|
||||
dustymustard: "#8C783C",
|
||||
dustyyellow: "#82783C",
|
||||
dustyorange: "#966446",
|
||||
dustyclay: "#8C6450",
|
||||
dustyterra: "#8C5A46",
|
||||
dustywine: "#96646E",
|
||||
dustyberry: "#96648C",
|
||||
dustymagenta: "#965A82",
|
||||
dustyplum: "#8C648C",
|
||||
};
|
||||
|
||||
const animationStyles = [
|
||||
|
||||
Reference in New Issue
Block a user