AI Guardrails Index

Created by Guardrails AI

AI Guardrails Categories

We broke AI safety down into 6 categories and curated datasets and models that demonstrate the state of AI guardrails using LLMs and other open source models.

Jailbreaking

Jailbreaking LLMs bypasses safety measures to generate harmful content, posing risks across industries.

Learn how effective models resist attempts to bypass their safety controls and restrictions.

Best Model

Detect Jailbreak logo
Detect Jailbreak
Performance: 0.81

Top Models Comparison

Detect Jailbreak logo
0.81
Anthropic logo
0.81

PII Detection

Exposing unredacted PII in AI applications risks compliance violations and privacy breaches.

Learn how well models identify and mask PII to ensure compliance and privacy.

Best Model

Guardrails PII logo
Guardrails PII
Performance: 0.65

Top Models Comparison

Guardrails PII logo
0.65
Gliner PII logo
0.62

Content Moderation

Unchecked AI outputs can spread harmful content, posing reputational and compliance risks.

Learn how well models filter toxic language and prevent the amplification of harmful content.

Best Model

Toxic Language logo
Toxic Language
Performance: 0.72

Top Models Comparison

Toxic Language logo
0.72
Google Natural Language Content Safety logo
0.60

Topic Restriction

LLMs can generate off-topic or unauthorized content, leading to misuse and compliance concerns.

Learn how well models identify deviation from topic boundaries and guidelines.

Best Model

Restrict to Topic (Hybrid) logo
Restrict to Topic (Hybrid)
Performance: 0.93

Top Models Comparison

Restrict to Topic (Hybrid) logo
0.93
Guardrails AI Model logo
0.91

Competitor Check

The inadvertent creation or favoring of competitor mentions can impact brand equity and control.

Learn how well models handle discussions of competing AI companies appropriately.

Best Model

Competitor Check logo
Competitor Check
Performance: 0.67

Top Models Comparison

Competitor Check logo
0.67
GCP Analyzing Entities API logo
0.64

Hallucination

AI hallucinations can result in inaccurate and misleading text that is nonetheless compelling and convincing.

Learn how different models tend to generate false or unsupported information.

Best Model

ProvenanceLLM logo
ProvenanceLLM
Performance: 0.77

Top Models Comparison

ProvenanceLLM logo
0.77
Minicheck logo
0.75

Model Leaderboard

Download PDF

A comprehensive visual comparison of how top-performing models stack up across key benchmarks like hallucinations, PII data exposure, and alignment with your AI strategy.

0s0.008s0.016s0.024s0.032s0.04s0.048s0.056s0.064s0.072sLatency0.30.450.60.751F1 Score

Deep dive into our findings

Download PDF

Learn more about our dataset curation process, our evaluation methodologies and our findings on the effectiveness of various guardrails.

001

Guardrails Tested


24

002

Number of Datasets


6

003

Days spent on GPU


32