AI Assistant in Adobe Experience Platform: Evaluation and continual improvement.
12-09-2025
AI Assistant in Adobe Experience Platform represents a leap forward in building enterprise-grade applications in the Generative AI era. This post provides a behind-the-scenes account of how we approach evaluation and continual improvement as detailed in our research paper: Evaluation and Continual Improvement for an Enterprise AI Assistant.
Problems we identified.
Enterprise users often face significant friction when trying to extract insights from their data. Conversational AI assistants, as illustrated in the figure below, promise to simplify this process. However, delivering a reliable, precision-oriented, enterprise-grade solution comes with unique challenges: fragmented data sources, evolving customer needs, and the risk of AI-generated errors that erode user trust.
AI Assistant overall architecture.
As we delved deeper into this project, we encountered a critical question: How do we effectively evaluate and improve an AI assistant that’s constantly evolving in a dynamic enterprise environment? This challenge is far from trivial. Enterprise AI assistants need to deal with sensitive customer data, adapt to shifting user bases, and balance complex metrics while maintaining privacy and security. Traditional evaluation methods fall short in this context, often providing incomplete or misleading feedback.
Our approach to solve these problems.
To address these issues, we’ve developed a novel framework for evaluation and continual improvement. At its core is the observation that 'not all errors are the same'. We have adopted a 'severity-based' error taxonomy that aligns our metrics with real user experiences (see the table below):
- Severity 0 errors: These are the most insidious — answers that look correct but are wrong, potentially eroding user trust.
- Severity 1 errors: Incorrect answers that users can’t recover from, leading to frustration.
- Severity 2 errors: Errors that users can overcome through rephrasing, causing minor annoyance.
Error severity framework in AI Assistant.
This taxonomy allows us to prioritize improvements that have the most significant impact on user experience and trust. It’s part of a comprehensive approach that includes:
- Prioritizing metrics directly impacted by production changes
- Allocating human evaluators efficiently
- Collecting both end-to-end and component-wise metrics
- Improving components across the system
Evaluation and continual improvement framework in AI Assistant.
The impact of this framework on our customers has been substantial. By focusing on severity-based errors, we’re delivering more reliable and trustworthy AI assistance. Our human-centered approach ensures that improvements align with real user needs and pain points, as illustrated in the table below.
Dashboard showing a snapshot of error severities and time-evolution for a single component. Illustrative data of similar magnitude to production numbers.
What's next for AI Assistant?
We’re just getting started. Our focus now is on making AI Assistant in Adobe Experience Platform even more proactive, meeting the users in their natural workflow and expanding coverage. We’re also improving our evaluation framework along a few key dimensions:
- Adding proactive evaluations over samples that are representative of production queries. This allows us to forecast the impact of new features and improvements on error rates.
- Formalizing error-severity definitions by breaking down the subjective determinations into a series of less-subjective questions that a human annotator must answer. This has helped to improve the consistency of these error severity determinations.
- Scaling evaluation with 'LLM-as-judge' annotations — an extremely active area of research wherein we are actively working on incorporating these methods, especially for tasks that do not require domain expertise to annotate.
To learn more about our work and the impact we’re seeing, read the full paper here.
If building generative AI at enterprise scale excites you — explore the latest highlights and career opportunities at the Adobe Experience Platform AI site.
Paper authors: Akash V. Maharaj, Kun Qian, Uttaran Bhattacharya, Sally Fang, Horia Galatanu, Manas Garg, Rachel Hanessian, Nishant Kapoor, Ken Russell, Shivakumar Vaithyanathan, Yunyao Li
Guang-Jie Ren and Huong Vu also contributed to this post.
Recommended for you
https://business.adobe.com/fragments/resources/cards/thank-you-collections/generative-ai