Legal AI Models Hallucinate in 16% or More of Queries

A recent study from Stanford found that LLM’s (GPT-4) and RAG-based AI tools (Lexis+ AI, Westlaw AI-Assisted Research, Ask Practical Law AI) hallucinate answers 16% to 40% of the time in benchmarking queries. GPT-4 had the worst performance while RAG-based AI tools did slightly better.

Hallucinations in this study wre defined as a response that is incorrect (factually inaccurate in some way) and misgrounded (propositions are cited but the source does not match the claim).

Read the preprint Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools.