Login | April 26, 2025
Benchmarking legal AI: Better than an actual lawyer?
RICHARD WEINER
Legal News Reporter
Published: April 25, 2025
Several legal GenAI platforms took part in a benchmarking exercise recently, the first such high level exercise since "Hallucination-Free?
Assessing the Reliability of Leading AI Legal Research Tools" run by Stanford HAI in May 2024.
This new, 2025-fresh benchmarking came from Vals AI Legal Report. Vals AI (https://www.vals.ai/home) is a company that regularly benchmarks enterprise LLMs.
Benchmarking in this instance is a comparative study of how effective a legal GenAI tool is compared with a real, live lawyer performing the same tasks. Who would be better?
Turned out to be about a tie.
The report assessed four different platforms on a variety of tasks: CoCounsel by Thomson Reuters, vLex, Harvey, and Vecflow. Not all of the platforms participated in all of the tests. Participants were also allowed to withdraw from any test prior to the publication of the study, and a couple of them did that. In particular, LexisNexis AI withdrew from the whole thing.
The tests measured results in the following areas: data extraction; document Q&A; document summarization; redlining; transcript analysis; chronology generation; and EDGAR research.
Several large, well-known law firms provided a total of 500 questions to ask the AI’s and the lawyers who sat for the test. To begin, the lawyers provided a baseline without the use of AI. The lawyers, who were unaware that they were creating benchmarks for this test, were sourced by an outfit called Cognia Law.
Vals AI then scored those responses and set the various AI models to do the same work.
Harvey had the best scores of the AI models, coming in first in all but one category. CoCounsel (Thomson Reuters) scored the one other win.
EDGAR was the hardest task for the models—only one opted in and the humans beat it. The humans were also better at redlining and chronology generation, or at least the report suggested that humans should take an active participatory stance in those categories.
The AI tools outperformed all the lawyers in document summarization and transcript analysis, and they were six to eighty times faster than the humans in every task overall.
So take heart. Human beings haven’t quite outlived their usefulness in practicing law.