The law of numbers
What the vLex-Fastcase merger tells us about the development of legal AI tools in Canada.
Fastcase and vLex were both founded as scrappy smaller competitors in online legal research markets dominated by larger players. That all changed with their recent merger, a clear signal that legal tech companies view extensive data collection as the key to succeeding in the accelerating race to bring legal AI tools to market. Those without access to the large bodies of data required risk being left behind or beholden to third-party providers.
The newly created vLex Group will reportedly have access to more than one billion legal documents from more than 100 countries. The hope is that vLex will apply its data corpus to create natural language models that use pre-processed data to improve interactions with computer programs.
The new AI landscape is a challenging one for companies and researchers operating in jurisdictions with smaller document collections, especially in languages other than English. Unsurprisingly, AI-driven legal tools are primarily based on English data sets from the United States. But for purposes of technical expediency, the AI systems around the world are trained primarily on U.S. data, and they risk of deviating towards American norms, so they are not always well-suited to the needs of users in other legal systems. In Canada, companies must contend with distinct legal traditions and the demand for content in both English and French.
Large language models like ChatGPT can give relevant answers about laws in smaller jurisdictions, though they may still hallucinate, says Jerrold Soh, the deputy director at the Centre for Computational Law at Singapore Management University. But users of these systems will need to specify details like which jurisdiction they are looking for in their requests, requiring some sophistication.
That will likely change with time. As new systems integrate data reflecting more varied situations, the lack of data representativeness will become less of an issue. Systems training requires the input of large volumes of text. Now they're trained on datasets as large as the open internet, where there are troves of data on traditionally less represented jurisdictions. The new systems will have more varied patterns to reference.
This means they can be asked to give unbiased answers, which can improve results significantly. As more tools are built for legal research and practices tailored to specific markets, new products will require less sophistication from users.
Legal technology firm Lexum is looking closely at how language models can support improved tools for use in Canada. It has an internal proprietary language model trained on Canadian law and has been working with openly available systems like ChatGPT, says CEO Ivan Mokanov. These have been promising, but the less glamorous work of collecting, organizing, and cleaning up court documents and legislative data is even more vital in this new environment. Applications require large volumes of structured data, which is labour-intensive to create.
And the data itself has to be available to feed language models, especially when it comes to smaller legal traditions, lest they be overlooked. If the data cannot be accessed, laws will not be accessible. Mokanov warns: "On a global scale, if your laws are not accessible, your legal traditions are in peril. The more accessible a legal corpus is, the more influence it will have."
None of which is to say that AI-driven systems can't or shouldn’t be used in legal contexts in Canada or elsewhere. However, it is important to consider how technology will transform the legal system. The legal community and society must decide how to balance different rights, needs, and priorities. Otherwise, technologists will make those decisions for them.