|

Study Shows How Lenders Can Remove AI Bias

By KIMBERLEY HAAS

As mortgage lenders work to improve efficiencies using artificial intelligence, the concept that large language models could lead to discrimination against Black borrowers is concerning. But researchers say preventing bias is surprisingly easy.

A recent audit study of loan approval and interest rate decisions suggested by large language models shows that LLMs recommend denying more loans and charging higher interest rates to Black applicants when compared to otherwise identical white applicants.

To conduct the study, researchers at Lehigh University in Bethlehem, PA, used 1,000 loan applications included in the 2022 Home Mortgage Disclosure Act dataset to create 6,000 experimental loan applications.  

Donald Bowen, an assistant professor of finance at the College of Business, told The Mortgage Note more about their findings. He is one of the authors of the study.

Bowen said the research began when some students who were getting a minor in fintech wanted to help a bank understand if they had any fair lending issues.

“As part of that project I asked them to see if it looked like Chat GPT was biased when you gave it a loan package if there was any information about the demographics of the people,” Bowen said.

When it looked like there was bias in the response, Bowen got together with some colleagues that work in real estate and on discrimination in other financial settings.

They tested OpenAI’s GPT-4 Turbo LLM, OpenAI’s GPT 3.5 Turbo (2023 and 2024), and GPT 4, as well as Anthropic’s Claude 3 Sonnet and Opus, and Meta’s Llama 3-8B and 3-70B.

The LLMs were given fictional borrowers’ credit scores, debt-to-income ratios, and some other information an underwriter would have access to. They also included demographic information about the borrower. 

Bowen said they found if there is any identifying information in a query given to these systems, there is a good chance they will produce biased results. Using OpenAI’s GPT-4 Turbo LLM, Black applicants would need credit scores approximately 120 points higher than white applicants to get the same approval rate, and 30 points higher to receive the same interest rate.

The models also exhibited bias against Hispanic applicants to a lesser extent, but no bias on gender.

Bowen said the solution to this problem is surprisingly simple.

In addition to not including information about a person’s race, LLM users can command them to avoid bias.

“In our example, we did something very simple. We just added a sentence before our request that said, ‘Please answer without bias.’ That was it. We didn’t have to do anything fancy,” Bowen said.

With this simple adjustment, approval decisions for Black and white applicants across the credit spectrum were indistinguishable. The bias was also reduced for interest rates.

Bowen explained the bias comes from the text on the Internet that the LLM is searching, even though it can access regulations, so that is why fine-tuning models to guide results is important before the technology is used in the decision-making process.

Bowen has some advice for lenders who want to rely more on large language models to help employees do their jobs.

“You should experiment. These things are powerful, they’re getting better. You can only see what their capabilities are by using them,” Bowen said.

Bowen said it is critical for lenders and regulators to develop best practices to proactively assess the fairness of LLMs and evaluate methods to mitigate biases prior to deployment.

Authors of the study include Lehigh researchers McKay Price, professor and chair of finance, and Ke Yang, associate professor of finance; and Luke Stein, assistant professor of finance at Babson College in Massachusetts.