[Music] [Joe Hughes] Hi there, everyone. I'm Joe Hughes, part of EY's Client Technology Organization. Specifically, I'm the Deputy Technical Officer for the Assurance part of Client Technology. Within this organization, I lead architecture, design, and product delivery. The product I'm focusing on here today is our global assurance knowledge product called EY Atlas, and its newest addition, EYQ Assurance Knowledge, powered by the latest GenAI technology.
So here's our agenda for today. We'll be covering some key custom features of our Adobe AEM guides and sites implementation, its architecture, a key add-on we call Diff Merge, and how DITA really helps. Towards the end of our talk, we'll leave you with our thoughts on the future vision of knowledge management and where we think it's headed. But first, we'd like to review you a short clip for some background on EY Atlas to help set the stage and give you some context.
[Woman] In a complex and demanding regulatory environment, adhering to accounting standards is vital. You need reliable sources of up-to-date information, on the changes that may affect your business. You need trusted intelligence.
EY Atlas Client Edition is a leading-edge global research platform for accessing accounting and financial reporting content. It provides the most relevant and up-to-date information, including external standards, EY interpretations, and thought leadership. Easily find content that's relevant to you, explore by accounting topic or industry through an intuitive interface allowing you to search and navigate faster and better. Get on-demand access on any device and tablets at any time of the day, whether in the office or on the move.
Enhance your understanding of regulatory implications on your business.
Staying up-to-date in a dynamic regulatory environment doesn't need to be hard. Simplify the research process by accessing the information you trust anytime, anywhere with EY Atlas Client Edition. Perform financial reporting with confidence and help build a better working world.
So as we've talked about, and you see in the video, EY Atlas, our knowledge management platform, is a cloud-based platform for accessing and searching accounting and financial reporting content, including external standards like FASB, EY's audit methodology, thought leadership, international GAAP, and other standards. It's actually part of suite of assurance delivery tools used daily by EY practitioners for delivery, methodology, enablers, and guidance, and work papers that we call NEON. The system foundation of all the knowledge displayed at EY Atlas is the content management system, Adobe AEM sites and guides. Atlas user base is more than just auditors that you can see here. There are also 26,000 client users using it, and the number continues to grow. Our professional practice has over 580 users, which includes content owners, content managers, and publishers globally from over 42 countries. This is the group that produces the majority of the content that resides in Atlas. There are also 110,000 users from other service lines, like our tax group or our risk group. And finally, there are 210 regulators and 40 legal professionals, a relatively small number by comparison, but an extremely important group from a regulatory perspective. And this all nets out to over one million unique visitors in every month. And it's one of the most used applications within EY.
So in this section, we're going to talk a little bit about the knowledge management system and then its architecture. So if we look at our ecosystem from left to right, we have a global content organization of almost 600 people that are creating and editing and publishing global knowledge for consumption, all of which is under regulatory review. Remember I mentioned how important the regulators were, meaning we need audit trails and compliance at every step. Given the quality and security standards we need to have for our knowledge documentation, we've implemented EY Atlas as a robust, scalable platform powered by Adobe, which enables AI and integration into our other client technology tools in Assurance and NEON, like, EY Canvas and EY Helix. This technology platform enables our business to turn on a dime with an ability to publish knowledge globally 24/7 in order to meet client demands. For example, crisis events guidance available at EY Atlas was pivotal to providing timely information on the evolving situation in Ukraine at the start of the war, helping to keep all of our 100,000 practitioners updated on evolving sanctions.
We have multiple data sources for knowledge as shown over here. We have on the left EY methodology, forms and templates, enablement materials on how to use our various products and tools, other EY guidance as well, such as accounting and financial reporting content, EY interpretations and thought leadership of regulations relating to US GAAP, international, other standards, and other service line content like example risk and tax. We also have third-party content, and sorry for all the acronyms, but FASB, Financial Accounting Standards Board, IFRS, the International Financial Reporting Standards, very important globally, Thomson Reuters, AICPA, PCAOB, the Public Company Accounting Oversight Board, also very important. And other global and regional external content, such as CPA Canada.
So how do we manage all of this? How do we bring all of this together? We're going to talk a lot about that today. And this diagram starts to tee it up.
So AEM does the heavy lifting for our content operations and also maintains a lot of the metadata that goes downstream to power our search experience. This also then feeds into large language models. So search experience, large language model, all fed from Adobe. To facilitate this, we use Darwin Information Typing Architecture, DITA for short. D-I-T-A. DITA provides an intelligent representation of our content in our editing tools, enabling all of the publishing features we need to maintain our knowledge with version control, audit trails, and security built in. Adobe has the largest CMS that supports DITA.
DITA also provides us intelligence embedded in our content so that downstream apps can easily quilt, query, and filter for relevant content to make our information retrieval more efficient and semantically smarter. And this gets really important when we talk about LLMs, which we're going to talk about a lot today. LLMs think in data, where data is not just a component of AI, but instead data and AI are intermixed. With GenAI, it's advantage to understand all of the connections between the data you have and the problem you're trying to solve. DITA provides annotations for AI, in fact, out of the box. And we'll have more details on how that helps later. One of the key processes here is called Diff Merge, which we have built to handle a key part of syncing all of this content, which we're going to talk about next.
So what does Diff Merge do exactly? Diff Merge allows us to import externally authored contents as new versions within existing DITA content. We then convert all 20 plus data sources, previously mentioned on the other slide, into DITA XML format from which we can apply to our Diff Merge tool. As you may have noticed in the video, Atlas provides a Wikipedia-like experience of dynamic content. It's not just static documents. It's not just documents to download and read offline. Content in Atlas is heavily interlinked with cross references, with thousands, if not hundreds of thousands of individual links that link to both documents and elements within documents, links to digital assets, images, PDF, Word, Excel, others, and hypertext.
So due to all of this extensive linking in our living Wikipedia application, we simply cannot import and do a replace of the content in Atlas. If we did, links would break, require extensive revisions, manual revisions, both within this content and to other content that links to this content. So the Diff Merge process was specifically built and deployed in order to maintain these extensive links.
So over here on the left of the slide, I mean, it's definitely hard to read because it's a screen capture. But it gives you an idea of a side by side interface, that has the existing content, and on the other side the content that is updated.
This is a little easier to follow, which are the seven steps we take to do a differential merge.
So first we do a structural comparison, which is a run of the folders and objects from the import package to the existing collection structure. This tells us what documents to compare. Then we do content differencing, a detailed comparison is run of XML and XML tags within a document, and it reviews for the differences within it. Only the changed content, tags and ID are updated in the XML.
Then we look at the changes, from the import package to existing documents with third-parties. The purpose being to identify only the changes. Then we do the migration step. During the process, the IDs used for linking are migrated from the AEM content to the imported content, so that link integrity is maintained once the import is applied. Once the differencing is complete, the headings, objects, and digital assets are merged into the collection.
Next, we automatically do the merge, do the assignment of works in process, and versions, and place the content into an edit project. And then finally, we publish all of the workflows and contents and digital assets, saving a ton of time. So I know this was a lot to absorb, but the benefit of the video is you can watch it again and again. But seriously, though, we'd love to hear about how others are wrestling with similar issues and tips and tricks. But whatever we can share with the AEM guides community, we'd love to talk about more. And who knows? Maybe AI can be applied here too.
So again, this is a technical content presentation. And so we're going to get even more technical here.
So how does DITA benefit? How does Darwin Information Typing Architecture help us? So as mentioned in EY's press release, we are introducing GenAI technology in Atlas, which we call EYQ Assurance Knowledge. And this provides our EY practitioners with new capabilities around knowledge search and the ability to summarize, technical auditing and accounting content. We're developing mission-critical, time-saving capabilities to help especially our young professionals find guidance in auditing and accounting. The guidance in EY Atlas varies between countries. We mentioned we've got authors in 42 countries. And also on the type of client that is being audited, for example, a publicly trading company versus conducting a government audit. This means specific questions posed to a GenAI question and answer type experience need to be answered precisely in accuracy. And the best way to do this with our EY Atlas content is through leveraging the rich metadata and semantic structure that comes along with DITA, and as well as the context of the user and the Atlas channel. So the context of the user, what are they doing when they're on their way to Atlas? What channel are they working in? Which channel meets country, in addition to, what do we know about the metadata and the semantic structure of those documents? And this leads us into the use of the retrieval augmented generation pattern. We're going to talk more about that as well. But it requires splitting up chunks of content into substructure documents, which is essential to making our GenAI function the way we need it to. To allow users to quickly grasp critical information such as what are reporting requirements and specific examples.
So again, to expand on our RAG approach, we use XML section based chunking because we're already working in XML. So annotations come out of the box. So we use XML layer filtering, prompt engineering, where we instruct the LLM to pay more attention to specific types of tag content, depending on the query and the question that we're asking like, requirements or examples. And maybe we're paying attention to examples, maybe we're not. It depends on the user context and specific questions being asked. But we don't have to guess what those examples are. We don't have to guess what a topic is. We know what it is because it's already tagged. So we also personalize the user experience where the engagement context is passed into the LLM for our methodology. So again, we will filter for content relative to the kind of [INAUDIBLE] being deducted, such as complex or non-complex audits. And what specifics in the LLM generated answers should be highlighted given the size of the client being audited? And lastly, we have industry and other metadata fields on the content to inform the LLM and make decisions on the fly what content is relevant to show or hide. We can do this intelligent filtering globally with all our content because we have a very granular custom data scheme already. The key takeaway here is we've done a lot of work on our content, originally to improve search, originally to improve maintenance, originally to give them Wikipedia like experience, but now we have this huge bonus to have it tagged and highly structured in a way that is paying off multifold in our ability to use GenAI technology.
So here's a simplified view of the EYQ Assurance Knowledge High-Level Architecture. Again, we've mentioned it a couple times, but AI aficionados will recognize the RAG pattern right away. This RAG pattern being the industry standard approach to building applications that use LLMs, or in our case, SLMs too. To process specific or proprietary data that the model doesn't already know. Like in this case, our EY methodology, it doesn't know, any EY specific terms as examples. In EY, our projects are called engagements. So we need the LLM to understand that we don't mean impending nuptials or other kinds of events. Engagement equals project for us.
So on the other one hand, you can see our Adobe Guide CMS connected to our Vector Database, and then interacts with the language models, okay? But you'll also see keyword search still in the architecture here so that our users can see our original search to compare with the Gen AI results on the fly as needed. That's important at this stage of Gen AI, where our accuracy is going up every day. But it's still required to compare to conventional results. And on the language models and sites, we originally started with full ChatGTP LLM, but we found that GPT‑4o Mini gave us tighter accuracy at a lower token cost. We're also evaluating the Phi-3 small language model with promising results. And this gets really interesting because then we'll be comparing token costs versus CPU costs.
So far, we're seeing great results with over 83% summarization in search accuracy.
We're also using Micro Frontends, and we're using AEM Content Fragments.
So this is a sample screen. This is not an actual screen, but a sample screen that shows the user interface elements in the EYQ Assurance Knowledge Micro UI.
Because it's a Micro UI, we can put it in any one of our [INAUDIBLE] applications. Its first home is in Atlas, though. As you can see, there's a question followed by an answer, a summarization, but also a reminder to apply professional judgment and adapt to specific situations.
You can ask refinement questions, and ask for references to the original content, which, if clicks, takes you back to the original Atlas Wikipedia source experience we showed earlier. A simple industry standard UI, but hides all the complexity underneath that was discussed in the previous slides to make it happen.
So we're going to end this talk with some of our future vision for knowledge management. AI will continue to dominate our daily lives. So we hear about new AI capabilities, new ways of AI generating information, predicting a pattern, suggesting meaningful alternatives for us all to consider. As we have shown already, we're ready to change to whichever LLM performs the best or small language model. We've got a modular architecture. We're ready to go, and we can adapt as the industry adapts and whenever new comes out. And when we think about how this applies to knowledge management, we really see how things that we've been focused on historically, machine learning search, better metadata, taxonomies, improving our content linking, gathering appropriate user context data, enabling smarter language understanding, really have come together for us to create that foundation, that's absolutely critical going forward. We think the future is really bright in the knowledge management space. You're going to see a resurgence in this space. Adobe is great at this space. We're very excited to see what unfolds next with content-aware suggestions, automated cross-references, and better translations. Adobe continues to help us evolve at the speed which our business demands to help us build a better working world.
So if this was a live presentation, we would have paused for questions along the way, as well as a QA session at the end. Since we're not, though, please feel free to reach out to me at joehughes@ey.com Or reach out to your Adobe representative, and they'll find a way to our Adobe sales team. And then happy to share information. And like I said, we're also eager to hear from all of you with tips and tricks on how to manage and progress in this world. And so with that, we're going to close with a little vignette about EY and thanks for watching.
[Man] In a world of seismic change, will your business shape the future or be shaped by it? How will we capture the imagination of tomorrow's consumers? And overcome operational constraints to focus on future growth? How will we harness technology and AI to shape the way we live our lives? And how will we balance environmental responsibility with economic prosperity? With EY's full spectrum of services across sectors. We're all in to shape the future with confidence.
[Music]