Guardrails Tested
24
We broke AI safety down into 6 categories and curated datasets and models that demonstrate the state of AI guardrails using LLMs and other open source models.
Jailbreaking LLMs bypasses safety measures to generate harmful content, posing risks across industries.
Learn how effective models resist attempts to bypass their safety controls and restrictions.
Exposing unredacted PII in AI applications risks compliance violations and privacy breaches.
Learn how well models identify and mask PII to ensure compliance and privacy.
Unchecked AI outputs can spread harmful content, posing reputational and compliance risks.
Learn how well models filter toxic language and prevent the amplification of harmful content.
LLMs can generate off-topic or unauthorized content, leading to misuse and compliance concerns.
Learn how well models identify deviation from topic boundaries and guidelines.
The inadvertent creation or favoring of competitor mentions can impact brand equity and control.
Learn how well models handle discussions of competing AI companies appropriately.
AI hallucinations can result in inaccurate and misleading text that is nonetheless compelling and convincing.
Learn how different models tend to generate false or unsupported information.
A comprehensive visual comparison of how top-performing models stack up across key benchmarks like hallucinations, PII data exposure, and alignment with your AI strategy.
Learn more about our dataset curation process, our evaluation methodologies and our findings on the effectiveness of various guardrails.
24
6
32