We broke AI guardrails down to six categories.
We curated datasets and models that demonstrate the state of AI safety using LLMs and other open source models.
Developer | Model | Latency | Metric |
---|---|---|---|
Anthropic | Claude 3 Haiku | 1.8267 ms | 0.8101 |
Guardrails AI | Detect Jailbreak | 0.0527 ms | 0.8118 |
jackhhao | llm_warden | 0.0119 ms | 0.7070 |
Meta | Llama Prompt Guard 86M | 0.0515 ms | 0.6663 |
Microsoft | AI Content Safety Prompt Shields | 0.0971 ms | 0.7331 |
zhx123 | ftrobertallm | 0.0267 ms | 0.7398 |
Developer | Samples |
---|---|
Roleplay/Pretend/Hypothetical | 1124 |
Resistance Suppression | 792 |
Permission-Granting | 447 |
Roleplay/Pretend/Hypothetical (DAN attack) | 242 |
Prompt Continuation or Saturation | 72 |
Character-Gradient Attack | 60 |
Prompt Saturation | 59 |
Character-Gradient Attack (Adversarial Noise) | 45 |
Prompt Obfuscation | 45 |
Prompt Obfuscation (Program Execution) | 22 |
Character-Gradient Attack (Special System Tokens) | 20 |
Prompt Continuation | 20 |
Character-Gradient Attack (Glitch Tokens) | 2 |