Law Ethics

Automating justice

<p> Our current system isn’t always fair. Why we shouldn’t assume that predictive technology is the answer.</p>

BY Agnese Smith 13 Mar. 2018

It’s been a bad decade for gut feelings.

Computers are taking over what used to be the exclusive domain of human decision-making. From movie-picking algorithms to self-driving cars that promise an end to fatal crashes caused by human error, we’re striving to develop technology that can predict when we will make bad choices and help us avoid mistakes.

But as the sophistication of software programs powered by artificial intelligence grows, so does our unease: We are increasingly dependent on technology we do not fully understand. And nowhere is that unease more apparent than in the legal world as it grapples with the implications of using a new generation of risk-assessment tools to guide decision-making in the justice system.

“People’s freedom is at stake,” said Carmen Cheung, professor of global practice at the Munk School of Global Affairs at the University of Toronto. “There’s a sense that [predictive software] might help, but we haven’t had a robust enough debate around the issues. Until we have a proper conversation, they probably shouldn’t be rolled out.”

There is little consensus about the proper role for data-driven risk assessment software in the justice process. Those in favour say this new generation of statistical tools can temper judicial bias, create more efficiency and ultimately result in fewer people going to jail while still keeping the public safe. Those opposed counter that the tools are unfair to minorities and could lead to even higher rates of incarceration and “warehousing” of individuals deemed too risky to remain in society.

As it stands now, predictive software “is a highly prejudicial and arguably racist tool,” according to Michael Bryant, the executive director at Canadian Civil Liberties Association. “For defendants, particularly the poor, it’s a potential disaster.” The former Attorney-General of Ontario says that as head of the province’s justice system, he would not have sanctioned its use.

The question is: Should concern over implications of this new technology stop us from capitalizing on the potential benefits? Would greater transparency or “interpretability” put our minds at ease?

The first thing to examine, some experts say, is our own notions about fairness.

“We need to agree on what tolerable risk is, and do it in a way that is explainable to a machine,” Cheung said. “I do worry that as a profession we don’t agree on the fundamentals about what justice is and what safety is.”

It’s unclear whether Canada, a world leader in the development of risk-assessment tools, will follow the lead of the U.S. and use them extensively to inform and guide decisions in the justice system. Justice departments in Canada are not overly eager to discuss them. Even proponents say that humans must make the final call.

Some argue that even the more sophisticated evidence-based assessments are simply another “tool” available to judges, court practitioners and related experts. Under the right conditions, with proper training and constant re-validation for accuracy and fairness, such objective risk assessments can and do form part of creating a just and more efficient system.

“There is evidence that these instruments are much more reliable and effective than
gut instincts,” said Mary Ann Campbell, director of the Centre for Criminal Justice Studies at the University of New Brunswick, “But [they] are not a decision-maker. It’s essential that those who use the tools understand what they are and understand what they can't do. Otherwise, you will get people making decisions based on a score without understanding what they mean.”

In fact, we live in a second-best world now, said Albert Yoon, a law professor at University of Toronto and co-founder of Blue J Legal, which uses machine learning to help predict case outcomes in tax and employment matters.

“Humans, even with the best intentions, are prone to biases and inconsistencies. Machines can help humans reduce these types of errors,” he added, pointing to a recent study by the National Bureau of Economic Research that concluded machine-learning algorithms can help reduce crime without putting more defendants in jail.

Used for decades

In the U.S., states use commercial and bespoke software products to predict recidivism and no-shows at trial. Two of the most popular tools include Toronto-based Multi-Health System’s Level of Service Inventory-Revised and COMPAS, (Correctional Offender Management Profiling for Alternative Sanctions) developed by Northpointe (now Equivant). Many states also use programs created by or together with non-profits and universities.

Using evidence-based assessments is nothing new. For decades, courts in many jurisdictions have used tools of varying sophistication to estimate the probability that individuals would reoffend once released or while out on bail. Some actuarial tools use only a few variables; risk- and needs-assessment instruments ask dozens of questions to better channel offenders into the right treatment.

Some programs already use AI, with next-generation tools expected to become even more autonomous. Defendants are grouped into high, moderate, or low risk based on various risk factors: static (age, arrest history) and dynamic (employment, peers, attitude). Many risk-assessment programs include subjective questions about things such as “pro-criminal” personality, associates or attitudes.

Most experts in Canada agree actuarial tools are helpful in some circumstances; they’re frequently used here in intimate partner abuse cases, mental health or violent offender issues and in the management of prisoners. More sophisticated AI-powered predictive software isn’t as widely used, although it’s difficult to get an accurate measure since case workers or probation officers don’t always declare the use of risk reports, said Stephen Wormith, director of the Centre for Forensic Behavioural Science and Justice Studies at the University of Saskatchewan.

“The picture is a little fuzzy,” said Wormith. But it’s fair to say “the traditional risk-assessment tools that are used in Canada tend not to use automation aside from inputting scores and totalling them.”

In contrast, the U.S. makes much greater use of algorithms at all stages. Some reports compare a defendant's situation to a large data set to make predictions, though it’s difficult to say exactly what information is pulled out since many products are protected as intellectual property. Unless there’s a greater push for transparency, some worry things will be much more difficult to understand as algorithms get more sophisticated and more autonomous.

Even to those who help design them, it’s not terribly clear how predictive software programs fundamentally differ from more traditional empirical reports. “It’s an interesting question,” said Wormith, a co-author of the LS/CMI, published by Multi-Health Systems, from which he receives royalty payments.

New generation predictive software “is a logical extension. There’s a continuum of automation from manual input to a complete turn-key operation where software does all the work… but it’s a different approach as [the programs] turn over the intellectual demands entirely to the software. This next phase is turning the computer loose on data sets that can come from anywhere. It really puts the operator at greater arms-length.”

Improving decisions

But as Canada struggles with high incarceration rates, particularly for bail and pre-trial defendants, automating more aspects of justice may prove tempting.

Ontario is examining the feasibility of developing a pre-trial risk assessment tool to help determine bail. “The preliminary work currently being undertaken focuses on technical feasibility, including identifying data requirements and resources required to support such a tool, and is exploratory in nature,” said a spokesperson for the Ministry of the Attorney General. The ministry declined to elaborate or make anyone available for interview.

Yoon, along with U of T and Blue J Legal colleagues Ben Alarie and Anthony Niblett, are currently studying bail in British Columbia to explore “how courts make decisions, and whether it is possible to apply statistics and machine learning to improve these decisions with respect to both public safety and cost.”

The study is purely academic, added Yoon. “Our goal is simply to analyze the question rigorously.”

Another view is that too many people are being formally detained before they are legally found guilty. Due to society’s concern about public safety, many defendants are jailed not because of what they have done, but because of who they are or what they might do, they say. So do we really need more human or machine-generated risk assessments?

Determining risk in order to better protect society remains the dominant paradigm today. But bail reform efforts are expanding both here and in the U.S. Ontario recently announced major changes to its directives with a focus on releasing low-risk defendants.

More than half of inmates in Canada's provincial and territorial jails in 2015-16 were awaiting trial or sentencing, a 35 per cent jump in a decade, according to Statistics Canada. Some are incarcerated because of administrative missteps, or because they violated terms of their release. Most are not charged with violent offences.

Considering that even a short stay in prison can have devastating consequences for entire families, “what we really need is a re-think, particularly what kinds of conditions we are putting in place. We should reserve our systems for [crimes] that are really serious,” said Nicole Myers, assistant professor at Simon Fraser School of Criminology and co-author of Set Up to Fail: Bail and the Revolving Door of Pre-trial Detention. “Things are shifting,” she added, referring to bail reform. “But it will take time, and what’s needed is better education and training” of court personnel.

“We need to think hard about who we detain and why,” added Kelly Hannah-Moffat, sociology professor at the University of Toronto, and an early critic of the use of algorithms in courts. “Most people can be managed in the community. We don’t have a massive flight problem here in Canada.”

What’s more, it’s difficult in Canada to see how simple score algorithms can adequately evaluate defendants with an aboriginal background. They account for a quarter of admissions to provincial and territorial jails, despite making up only three per cent of the population. (R. v. Gladue directs courts to take cultural considerations into account when dealing with aboriginal people.)

To what extent actuarial justice tools contribute to ballooning jail populations is a matter of debate. But the original idea behind empirical tools was to reduce the potential for human bias, weed out lower-risk defendants from more dangerous potential criminals and cut the numbers in detention. The goal is to target high-risk individuals with treatment in the hope of reducing the likelihood of reoffending, as well as help assess appropriate length of detention. These tools could also monitor changes in attitude/risk over time.

However, opinions vary as to the efficacy and fairness of risk assessments in general and AI-driven ones in particular.

On the one hand, assessing risk is an important and legally accepted part of the justice system. Anything that makes the process more efficient is helpful, even if it is generated by a machine. These can be a lot less racist in their judgement than the humans they're replacing, though they may be far from perfect. Evidenced-based tools are scientific and more objective than humans, proponents argue. They can improve sentence accuracy and reduce unnecessary incarceration from the start.

“A properly tested and accepted risk assessment tool can objectively validate risk and suggest conditions or terms that might manage the identified risk in the accused,” wrote former Chief Judge of the Provincial Court of Manitoba Raymond Wyant, in a recent report to the Ontario Ministry of the Attorney General, Bail and Remand in Ontario. “This is valuable work.”

Others suggest the ownership of the software is the troubling part, not necessarily the instruments.

“Given how much room for improvement there is in the status quo, I don’t think we can afford to be complacent when we might be able to do much better,” said Vincent Chiao, law professor at the University of Toronto, in context of bail reform. “This isn’t to say that we should blindly rely on whatever AI software some for-profit company is hawking.

“But, conversely, we also shouldn’t blithely assume that human intuition [and] judgment cannot be improved upon by actuarial judgment. I think the evidence pretty clearly indicates the opposite,” Chiao said in an email.

The concern, however, is that current predictive software are building models using historically biased training data. Critics charge that future-predicting tools are flawed and practitioners could easily be dazzled by the “mathi-ness" of the algorithms. They are not neutral; they institutionalize and amplify bias, they say.

“Algorithms are opinions embedded in code,” Cathy O’Neil, mathematician and author of the bestseller Weapons of Math Destruction, said in a TED Talk last year. They “don’t make things fair. They repeat our past practices, our patterns. They automate the status quo.”

Whether the Canadian justice system is fair is a much larger issue, but there’s evidence of problems; Reuters recently reported that data shows black people in Ontario spend longer in pre-trial detention than white people charged with the similar types of crimes.

If the current predictive software was potentially as good at identifying lower-risk defendants as some proponents claim, “then both sides would advocate them,” argues Bryant. “When defence lawyers say they want to start using them, then you’ve got something.”

In a 2014 speech to the National Association of Criminal Defence Lawyers, then-U.S. Attorney General Eric Holder discussed the potential of algorithms to improve resource allocation and restore the proportionality that mandatory sentencing lacks. While in office, he had introduced measures that included data-driven reforms, but he urged caution in his speech, citing concerns about unequal justice.

“By basing sentencing decisions on static factors and immutable characteristics – like the defendant's education level, socioeconomic background, or neighborhood – they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society,” he said.

“Criminal sentences must be based on the facts, the law, the actual crimes committed, the circumstances surrounding each individual case, and the defendant’s history of criminal conduct. They should not be based on unchangeable factors that a person cannot control, or on the possibility of a future crime that has not taken place.”

Falsely flagged

While some academics have flagged the potential of bias creep for at least a decade, the debate went public after U.S. investigative journalists at ProPublica published a 2016 report identifying problems with risk tools like COMPAS. After crunching the numbers from a test sample, the journalists found that the “formula was particularly likely to falsely flag black defendants as future criminals, wrongly labelling them this way at almost twice the rate as white defendants. White defendants were mislabelled as low risk more often than black defendants.”

Northpointe reportedly said the data was misinterpreted. Since then, academics on both sides have debated what is statistically “fair.”

Human rights activists have called for a halt in the use of predictive algorithms by the courts, particularly in the pre-trial system.

In a letter made public last July, Human Rights Watch wrote that such risk-assessment tools are troubling as they ignore “the specific context of that person’s life: a person with a “high risk” record may also have individual characteristics or needs that merit release without supervision or lower levels of supervision, but will be ignored by the prediction tool.”

Along with age and gender, arrest history is crucial in building risk assessments, but many say the latter are heavily skewed by race. Crime “hot spots” lead to more arrests, which lead to more people with criminal history, which lead to a higher “risk” profile and longer sentences, which lead to inflated designated crime hotspots, which lead to higher arrests.

Other variables – and how they are weighted – may also be problematic, but it’s hard to tell because firms do not reveal much about their formulas, some say.

To what extent court practitioners understand the limitations of risk scores is a big issue. Humans inherently “trust” technology to make correct decisions. They get easily lulled into thinking the machine – however imperfect – will do the right thing 100 per cent of the time.

The danger is that current algorithmic-based programs give a scientific patina to what are essentially opinions on what may happen in the future. Court practitioners will have trouble going against the recommendation of what they perceive to be a more expert risk predictor. Such reports will also be easier to use in court.

“They’re seen as more legally defensible and better than just gut” feelings, said University of Toronto’s Hannah-Moffat. “It will be hard for people to over-ride” them.

Whether resource-strapped courts will take the appropriate steps or have the technical know-how to constantly monitor, update and validate sophisticated risk tools for fairness is also a concern.

The efficiency question

Ethical issues aside, there are also doubts that algorithms will lead to more streamlined, efficient courts.

Given all the issues associated with algorithms – flawed data, tools and goals – “they come with more challenges than they are worth,” says Simon Fraser’s Myers. Software programs “are not going to be what solves the problems. The starting point should be about shifting the risk culture” to ensure fewer people end up in jail before they are found guilty.

Bryant points out that AI risk assessment tools may clog up courts even more because rulings could be challenged on the same grounds that breathalyzer tests are contested: the possible unreliability of the technology. Two-tier justice will result, with richer defendants able to summon the requisite expert witnesses.

Indeed, much has been written about the problems caused by the opacity of the current technology. Because of its complexity, it’s difficult to know exactly how a machine comes up with its output. Data scientists have concluded that in future software programs will only be as useful as their ability to explain themselves. The European Union has effectively restricted their use under the General Data Protection Regulation, which stipulates that users have a “right to explanation.” GDPR comes into effect in 2018.

The solutions to these problems may lie in better regulation and technology: While we can’t change history – the basic dataset – we can develop tools that not only reveal bias but also actively eliminate it.

Computer scientists insist that it is possible to create fairer, more transparent and accountable systems, though some question whether they should be used in the justice process. Ultimately, these are our tools – we are in control of the programming. “It just depends on how you design the software," said Joanna Bryson, professor of computer science at the University of Bath. “If you want more people in jail, you’re going to get more people in jail.”

Ultimately, predictive coding remains an attractive to many, says University of Toronto’s Cheung, and therefore “we can’t afford to not have a debate about it.”

Even the most transparent and accountable software program needs clear goals. The first step is to decide what, as a society, we want to achieve.

Agnese Smith is a regular contributor based in London, England.

Automating justice

Used for decades

Improving decisions

Falsely flagged

The efficiency question

RELATED ARTICLES