Baylor Joins Multi-University Consortium to Launch First Cross-Faith AI Benchmark
Researchers from BYU, Notre Dame, Baylor and Yeshiva reveal gaps in how AI models handle religion
(Credit: inkoly / Collection: iStock / Getty Images Plus)
(ATHENS, Greece) – A new multi-university academic consortium, including researchers from Baylor University, has found AI models have significant biases and gaps when it comes to addressing faith and religion.
Newly published research from The Consortium for Evaluation of Faith and Ethics in AI (CEFE-AI) – a collaboration among researchers at Brigham Young University, Baylor University, the University of Notre Dame and Yeshiva University – found a consistent, repeatable pattern: religious perspectives are being left out of AI responses.
“There are very practical questions people have about life, everyday situations about grief, love, loss, morality, and often AI does not bring religion into those conversations,” said lead researcher David Wingate, a BYU professor of computer science. “Religion is an important part of human flourishing; 75% of the world’s populations maintains religious identity. As we build AI technologies, there’s no reason we shouldn’t build them to support people in what’s important to them.”
CEFE-AI, which has posted three papers to date on AI’s religious bias and exclusion of religious topics, was announced May 26 at the Summit on AI Ethics in Athens, Greece. Elder Gerrit W. Gong of the Quorum of the Twelve Apostles of The Church of Jesus Christ of Latter-day Saints gave the keynote address, emphasizing the need to portray faith traditions accurately, honestly and respectfully. Baylor University’s Paul Martens, Ph.D., associate professor of ethics and director of Baylor’s Center for Ethics, represented Baylor at the Summit and was among the CEFE-AI scholars participating in a panel discussion May 27.
“The world’s great religious, philosophical and ethical traditions have guided human civilization and society for millennia; we need that wisdom and those values to anchor AI today,” Elder Gong said. “To offer all it can for the greater good of individuals and society, AI needs to reflect faith, moral compass, and the gift of possibility.”
All-Faith Benchmark key findings
As key part of their work, CEFE-AI has released initial datasets of the AllFaith Benchmark, one of the first multi-faith sets of tests that examines how AI systems engage with a plurality of religions. The benchmark includes hundreds of real-world ethical questions sourced from ChatGPT transcripts and faith-community contributors.
The researchers have tested the benchmark on 14 different LLMs, including flagship models from Anthropic (Claude 4.7), Google (Gemini 3.1), xAI (Grok 4.2) and OpenAI (ChatGPT 5.5). Key findings include:
- A survey of 1,125 Americans found most people expect religious perspectives in responses to ethics questions, but nearly all AI models failed to provide any religious content in answering those queries.
- “Consistent with studies that show religion's persistent moral relevance for the majority of the world's population, we also found that people see religion as significant across hundreds of real-world ethical questions,” Baylor University’s Paul Martens said. “Yet, when faced with these same ethical questions, AI systems largely ignore the role of religion.”
- Models show clear and consistent biases in giving guidance about religion conversion, systematically encouraging movement toward some faiths and away from others.
- In over 12,000 research papers about AI bias, only 0.2% address religious bias
“More than any previous technology, AI influences public discourse and perceptions. When AI actively excludes religious voices from these important conversations, it impoverishes humanity, rather than enriching it,” said Fr. John Paul Kimes of the University of Notre Dame. “The exclusion of faith from the digital public square diminishes our capacity for authentic dialogue which is necessary to build up the common good.”
The researchers also used the AllFaith Benchmark for a conversion bias test and found that models would subtly encourage users toward conversation to some faiths, while subtly discouraging users from converting to others.
Across all models, the biases were consistent and measurable:
- Nearly every model produced a negative bias towards Jehovah’s Witnesses and a positive bias towards Catholicism.
- Models from Anthropic and Meta showed the least bias of any models tested.
- Grok produced the strongest biases — strongly favoring Catholics and Protestants, while showing negative bias toward Jehovah’s Witnesses, Baha’i and Hindus.
CEFE-AI representatives said the group is just at the beginning of their research partnership. They hope their continued work makes it to the eyes of language model providers, leading to constructive conversations of how to improve their products to better benefit humanity.
“AI is changing the world at an astounding rate, with implications in every area of life,” said Rabbi Daniel Feldman of Yeshiva University. “It is crucial that those who care about the role of religious values in the world engage proactively with those driving these changes so that we continue to see these values reflected and honored in the new landscape.”
New challenges at the human-technology interface
One of Baylor’s four imperatives within the Baylor in Deeds strategic plan focuses on the emergence of artificial intelligence and other technologies and the many resulting challenges, including the work by the CEFE-AI team at the intersection of AI and ethics.
“Of the many challenges emerging at the human-technology interface today, this is one that Baylor and the other faith-based universities in the consortium are uniquely equipped, and perhaps even obligated, to address,” Martens said.
In addition to Martens, a core team of Baylor content experts across disciplines contributed to the CEFE-AI research, with a larger group of faculty consulted on specific issues as necessary. The core Baylor team includes:
- Elisabeth Rain Kincaid, Ph.D., J.D., director of Baylor’s Institute for Faith and Learning, associate professor of ethics, faith & culture at Truett Seminary and affiliate professor of management at the Hankamer School of Business;
- Coretta Pittman, Ph.D., vice provost for community engagement and belonging and associate professor of English;
- University Chaplain and Dean of Spiritual Life Charley Ramsey, Ph.D.;
- Christopher Richmann, Ph.D., associate director for the Academy for Teaching and Learning and affiliate faculty in the Department of Religion and;
- Neil Messer, Ph.D., professor of theological bioethics.
“As artificial intelligence increasingly becomes a trusted source of information and advice for people around the world, the question of how AI engages and represents religious faiths also becomes increasingly important,” Martens said. “Baylor, as a Christian university, cares very much that AI gets our faith and ethics right and, as an expression of the golden rule, we expect that same accuracy and fairness for our sisters and brothers of other faiths. CEFEAI's purpose, therefore, is to develop a series of benchmarks that test the extent to which various AI models fairly and accurately reflect our lived traditions, both in terms of ethics and faith claims."
Read more about CEFE-AI here: https://cefe.ai/
ABOUT BAYLOR UNIVERSITY
Baylor University is a private Christian University and a nationally ranked Research 1 institution. The University provides a vibrant campus community for 20,000 students by blending interdisciplinary research with an international reputation for educational excellence and a faculty commitment to teaching and scholarship. Chartered in 1845 by the Republic of Texas through the efforts of Baptist pioneers, Baylor is the oldest continually operating University in Texas. Located in Waco, Baylor welcomes students from all 50 states and more than 100 countries to study a broad range of degrees among its 12 nationally recognized academic divisions. Learn more about Baylor University at www.baylor.edu.