A recent study from Stanford found that LLM’s (GPT-4) and RAG-based AI tools (Lexis+ AI, Westlaw AI-Assisted Research, Ask Practical Law AI) hallucinate answers 16% to 40% of the time in benchmarking queries. GPT-4 had the worst performance while RAG-based AI tools did slightly better.
Hallucinations in this study wre defined as a response that is incorrect (factually inaccurate in some way) and misgrounded (propositions are cited but the source does not match the claim).
Read the preprint Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools.
Links to this note
-
Graphrag Combines Knowledge Graphs With Retrieval
One of the biggest criticisms of LLMs is that they don’t actually know anything. Many techniques have been explored to use general purpose artificial intelligence to solve domain specific problems using information that it was not trained on. Retrieval-augmented generation (RAG) does a decent job of enabling you to “bring your own data” but can still fail on more specialized use cases.
-
Several startups are touting AI employees that you can hire to perform a specific function. Itercom announce Fin, an AI customer service agent and so did Maven AGI. Piper is an AI sales development representative, so is Artisan and 11x. Devin is a software engineer.
-
With the growing popularity of tools like Perplexity, OpenAI, Search GPT, and retrieval-augmented generation (RAG), and a healthy dose of skepticism in artificial intelligence (e.g. hallucinations) the industry is moving from “authoritative search” to “research and check”.
-
Mentioning AI Decreases Purchase Intent
A recent study measuring the effect of including the term “artificial intelligence” in the description of products and services decreases overall purchase intent. This is more pronounced in high-risk products (like financial products) than in low-risk products (like a vacuum cleaner).