A recent study from Stanford found that LLM’s (GPT-4) and RAG-based AI tools (Lexis+ AI, Westlaw AI-Assisted Research, Ask Practical Law AI) hallucinate answers 16% to 40% of the time in benchmarking queries. GPT-4 had the worst performance while RAG-based AI tools did slightly better.
Hallucinations in this study wre defined as a response that is incorrect (factually inaccurate in some way) and misgrounded (propositions are cited but the source does not match the claim).
Read the preprint Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools.
Links to this note
-
Several startups are touting AI employees that you can hire to perform a specific function. Itercom announce Fin, an AI customer service agent and so did Maven AGI. Piper is an AI sales development representative and so is Artisan. Devin is a software engineer.
-
With the growing popularity of tools like Perplexity, OpenAI, Search GPT, and retrieval-augmented generation (RAG), and a healthy dose of skepticism in artificial intelligence (e.g. hallucinations) the industry is moving from “authoritative search” to “research and check”.
-
Mentioning AI Decreases Purchase Intent
A recent study measuring the effect of including the term “artificial intelligence” in the description of products and services decreases overall purchase intent. This is more pronounced in high-risk products (like financial products) than in low-risk products (like a vacuum cleaner).