We broke AI guardrails down to six categories.
We curated datasets and models that demonstrate the state of AI safety using LLMs and other open source models.
| Developer | Model | Latency | Metric |
|---|---|---|---|
| Anthropic | Claude 3 Haiku | 1.8267 ms | 0.8101 |
| Guardrails AI | Detect Jailbreak | 0.0527 ms | 0.8118 |
| jackhhao | llm_warden | 0.0119 ms | 0.7070 |
| Meta | Llama Prompt Guard 86M | 0.0515 ms | 0.6663 |
| Microsoft | AI Content Safety Prompt Shields | 0.0971 ms | 0.7331 |
| zhx123 | ftrobertallm | 0.0267 ms | 0.7398 |
| Developer | Samples |
|---|---|
| Roleplay/Pretend/Hypothetical | 1124 |
| Resistance Suppression | 792 |
| Permission-Granting | 447 |
| Roleplay/Pretend/Hypothetical (DAN attack) | 242 |
| Prompt Continuation or Saturation | 72 |
| Character-Gradient Attack | 60 |
| Prompt Saturation | 59 |
| Character-Gradient Attack (Adversarial Noise) | 45 |
| Prompt Obfuscation | 45 |
| Prompt Obfuscation (Program Execution) | 22 |
| Character-Gradient Attack (Special System Tokens) | 20 |
| Prompt Continuation | 20 |
| Character-Gradient Attack (Glitch Tokens) | 2 |