Legal AI Where Hallucinations Carry Liability

Accuracy, hallucination rate, and adversarial testing for contract analysis AI, legal research tools, and document classification — before errors become professional liability.

LegalTech AI faces a unique quality challenge: legal errors are not just accuracy failures — they can constitute professional liability, unauthorised practice of law, or material misstatements in legal proceedings. The Mata v. Avianca case illustrated the stakes in the most public way possible.

Citation Hallucination: The Mata v. Avianca Problem

In 2023, attorneys submitted a legal brief containing AI-generated citations to non-existent cases in federal court. The citations looked real — plausible case names, realistic citation formats, credible-sounding holdings. The AI had hallucinated them with high confidence.

This is now the canonical risk scenario for legal AI. Any AI tool used in legal research must be evaluated for citation hallucination rate before deployment — and evaluated continuously as the underlying model is updated.

Our legal AI evaluation sets include hundreds of jurisdiction-specific questions with verified ground-truth answers, specifically designed to probe citation accuracy under the conditions where hallucination is most likely: questions near the boundary of the model’s training data, obscure precedents, recent cases, and cross-jurisdictional questions.

Contract Analysis: Precision and Recall at the Clause Level

Contract AI accuracy is not a single number — it varies by clause type. An AI that is 97% accurate at identifying limitation of liability clauses may be 71% accurate at identifying change of control provisions. You need to know which clause types are reliable before deploying in a review workflow.

Our contract analysis QA produces clause-level precision and recall metrics — giving your team and your clients the specific information they need to use the AI appropriately: high confidence on some clause types, mandatory human review on others.

Ship AI You Can Trust.

Book a free 30-minute AI QA scope call with our experts. We review your model, data pipeline, or AI product — and show you exactly what to test before you ship.

Talk to an Expert