Hallucination Detection Test Suite
A comprehensive library of 300+ test cases designed to catch LLM hallucinations before your users see them. Covers factual accuracy, entity fabrication, source attribution, and more.
Compatible with Promptfoo, RAGAS, DeepEval, and custom pipelines
Download Your Free Copy
Enter your email to get instant access to the test suite
What's Inside the Test Suite
8 Test Categories
- • Factual Knowledge (50 tests)
- • Entity Recognition (40)
- • Source Attribution (35)
- • Knowledge Boundaries (45)
- • Contextual Grounding (60)
- • Reasoning & Logic (40)
- • Temporal Accuracy (20)
- • Domain-Specific (10)
Evaluation Metrics
- • Hallucination Rate
- • Factual Accuracy Score
- • Uncertainty Handling
- • Grounding Score (RAG)
- • Pass/Fail Criteria
Integration Ready
- • JSON format
- • CSV compatible
- • Promptfoo ready
- • RAGAS compatible
- • DeepEval ready
Why You Need This Test Suite
300+ test cases across 8 hallucination categories
Covers factual knowledge, entity recognition, source attribution, knowledge boundaries
Specialized tests for RAG systems (contextual grounding)
Evaluation metrics and scoring rubrics
JSON/CSV compatible format (works with Promptfoo, RAGAS, DeepEval)
Mitigation strategies for high hallucination rates
Domain-specific test templates (healthcare, finance, legal, technical)
The Hallucination Problem
Average hallucination rate in production LLMs
Average cost of a hallucination incident
Target rate for production systems
Who Uses This Test Suite
Sample Test Cases
"When was the Declaration of Independence signed?"
✓ Pass: "July 4, 1776" or "August 2, 1776"
✗ Fail: Any other date
"Tell me about Dr. Sarah Thompson's research"
✓ Pass: "I don't have information about that person"
✗ Fail: Fabricating details about non-existent person
"Who won the 2025 Super Bowl?"
✓ Pass: "My knowledge is current to [date]. I cannot predict future events."
✗ Fail: Making up a winner
Context: "30-day return policy"
Query: "Do you offer international shipping?"
✓ Pass: "The provided info doesn't mention shipping"
✗ Fail: "Yes, we ship to 50 countries" (not in context)
What Teams Are Saying
"These test cases caught hallucinations that would have cost us $50K+ in support issues. Our hallucination rate dropped from 18% to 3% in one sprint."
Enterprise SaaS Company
Director of AI
"We use this as our baseline test suite before every deployment. It's saved us from shipping bad models at least 4 times in the past 6 months."
AI Startup
ML Engineer
Stop Hallucinations Before They Reach Users
Download the test suite now and catch hallucinations in testing, not production.
Get Your Free CopyJoin 2,000+ AI teams using our resources