Commentary

AI Is Revolutionizing Everything. Where Can (Human) Evaluators Add the Most Value?

An internally displaced persons (IDPs) camp near Bentiu, South Sudan. | Photo: United Nations Photo/flickr (CC BY-NC-ND 2.0)

Elias Sagmeister

28 May 2026,

published in

ALNAP

This commentary is part one of a two-part series. The first installment was published on May 7, 2026 here.

In the first part of this series, I argued that Artificial Intelligence (AI) is ready to take on much of the desk review and data collection work that currently absorbs the bulk of evaluation budgets. What should evaluators be doing with some of the time and money this frees up? More of what humans are uniquely good at, and more of what tends to get cut first, when budgets are tight.

Analysis: Beware the Accommodating Machine

If data collection is where AI can do the most, analysis is where it needs the tightest rein. Even before AI, UN evaluations have shown a creeping inflation in positivity¹. AI is likely to accelerate that drift. It synthesises toward consensus and frames findings diplomatically. Set to work on policy documents, it tends to mirror their promotional tone, take stated intentions at face value, and miss the political dynamics any experienced evaluator would flag. Analysis is therefore the stage that needs more human involvement, not less. One practical way to do this is through structured analysis workshops within evaluation teams, and between evaluation teams and evaluation offices. These provide an opportunity for humans with relevant experience to pick apart early findings and interrogate their plausibility, while adding relevant context and political considerations.

Reporting and Follow-Up: Extend the Evaluator’s Mandate

For reporting, AI handles editing, summaries, and quality assurance easily. The more consequential shift is what happens after the report is finalised. Today, external evaluators leave once the final version is delivered, and everything that follows is entirely internal. The people with the deepest understanding of the findings are given no role in making them stick. The days saved on data collection should fund a different commissioning model: TORs that do not end at ‘final report’ but include follow-up milestones at three and six months. Evaluators would help navigate the politics of implementation, push back when things stall, and hold the organization accountable to its own commitments. This only makes sense where there is genuine appetite for learning. For compliance evaluations, the AI-augmented model simply makes them cheaper, and a part of those savings can be redirected to evaluations where follow-through is actually intended.

What This Costs

Put together, the numbers could change substantially. A learning evaluation run on this model might involve 50 to 65 external days plus around EUR €5,000 in AI tools, bringing the total to roughly 50,000 to 80,000 euros, down from the 100,000 to 50,000 euros typical today. The mix shifts as well as the total: data collection shrinks, while follow-up grows from almost nothing to 15 to 20 days. Compliance evaluations, which probably make up a significant share of the overall portfolio, become leaner still: 20 to 30 days and 30,000 to 45,000 euros for the same scope, roughly a third of the current cost.

Where to apply the AI-augmented model, and what to do with the days and euros it frees up, is a strategic decision for evaluation offices to make. The alternative is that the savings are simply booked, budgets are trimmed across the portfolio, and the result is cheaper, more automated evaluations with weaker uptake than the ones they replaced.

What remains genuinely valuable, in this AI-augmented model, is the critical outside perspective, and the willingness to accompany organizations through the change their findings point to. If we, as evaluators, can deliver more of that with fewer resources than before, this moment of crisis might yet be one we can look back on and say we did our part.

This commentary was originally published by ALNAP on May 26, 2026.

1 See Eckhard, S., Jankauskas, V., Leuschner, E., Burton, I., Kerl, T., & Sevastjanova, R. (2023). The performance of international organisations: a new measure and dataset based on computational text analysis of evaluation reports. The Review of International Organizations, 18(4), 753 – 776. doi:10.1007/s11558-023 – 09489‑1