If you work with AI tools for ministry, discipleship, or missions, you may want to check out the Great Commission Benchmark (GCB). The GCB is a project that evaluates AI models specifically for Great Commission work. It asks a practical question: Which AI models will actually help you make disciples?
To answer this, the benchmark measures three key areas:
-
Task capability – Can the model do the task well?
-
Gospel core fidelity – Does it stay aligned with the core of the gospel?
-
Worldview alignment – Does it reflect a biblical worldview?
You can explore the project, learn how it works, and see results at:
https://greatcommissionbenchmark.ai.
I think there is an issue here in that there are two ways to use AI. One is to use it for reasoning, brainstorming, research, writing, analysis, what have you. The other is to use it (I think this is what GCB is testing) to craft responses to incoming questions.
The problem is that what you want for Use Case A is probably NOT what you want for Use Case B. In Use Case A – and here, in my exerperience, Claude really shines – is the ability to describe, analyze, compare, contrast different ideas, beliefs, and reasons. For example, I might want to compare & contrast what an Evangelical, a Catholic, an Orthodox, and a secular humanist might say about _x_. This helps me understand the position from a “steelman” perspective.
In Use Case B, on the other hand – if you were asking an AI to draft evangelistic material in response to an incoming question that is perhaps in FB Messenger or Whatsapp or something else – you might not want ot bring up all these different positions. You want to articulate an answer to a question that fits within a specific box, for any one of a number of reasons.
You can do that with certain guardrails, I imagine. For example, in my Weekly Roundup, I ask AI to select appropriate quotations from a pre-selected list. It’s very good at doing that: it doesn’t make up the quotations, but they are all based on the content of the Roundup. But that’s more Use Case B: you have to pre-design the responses that are appropriate in a variety of situations.
What makes an AI strong for Use Case A might actually make it fall to the bottom of this particular Leaderboard. As Claude tells me, Caveat lector–let the reader beware. Or at least be very familiar with what exactly the leaderboard is measuring.