Get Inspired and Scale Faster with Generative Experiments

[Music] [Paul Alemán] Hello, everybody. Welcome to the session. Huge, huge thank you for everybody that showed up to the session. I know it's the last day of Summit and it's the afternoon and last day of Summit. Internally at Adobe, we call this the Hangover Day. And so we're really, really happy to have you guys and grateful that you are. And I think you guys are going to be happy that you came because we have some incredibly cool stuff to show, some stuff that has really drove in a massive amount of impact within our Adobe.com infrastructure, a new technology that we've been leveraging, and a story that has been brewing for the last year or so. So some teaser jumping in. I'm not going to go through this super thoroughly, but I do want to note this presentation is going to be a story that leads to the introduction of a brand new product from the Adobe Journey Optimizer and Target teams. It's going to be really, really cool, and we'll end this with a demo. But first, we're going to do some introductions. And I'm going to hand it over to my colleague David. [David Arbour] Hi. So I'm David. I'm a Research Scientist at Adobe Research, where I work on experimentation, causal inference, and AI where it suits us, which you'll hear a lot more about soon.

And my name is Paul Alemán. I work for Adobe.com actually. And I have quite possibly the coolest job in the world. I, on a team called personalization platform, and we spend all day and all year dreaming of massive huge ideas that leverage the latest and greatest in technology and put them into actual practical use, get them on production, and validate them as quickly as humanly possible. But I didn't do always do that. Before I was on the platform team, I was a growth product manager, which, if you're unfamiliar with that term, is somewhere in between a marketer and a product manager. And me and my team, we spent all day and all night and all year running experiments.

So we're going to do something fun here today. We're going to give a really candid look behind the curtain of Adobe.com And David and I are going to be your guides for this.

As I mentioned before, I'm going to tell a little story here. So the story starts a bit ago when I first joined the growth team that I just mentioned.

I was very fortunate to have joined a very high performing team. In the first year we were operating, we beat our targets we had. We were responsible for driving incremental growth through experimentation. We beat our targets by 70%. We had 170% of our targets that year. And we were ecstatic. My head was growing big. We were superstars. And we were like the corporate equivalent of being like, braided down the street. And then, of course, whenever that happens, people are like, "Well, can you do it again?" And so the next year, we worked really, really hard and we beat our targets again. We had a super high performing team. Not only did we meet that same expectation, we beat it. We doubled our targets that year. So I have a question. Who in the audience-- Is there anybody in the audience that is really hands-on involved in experimentation in the organization? Maybe you're a product manager or a marketer, or you're in analytics? Let's see. A few people. Yeah. Good amount.

So you know what's going to happen next after you beat your targets too many times in a row. And if you have any leaders in the room, they know that too.

The next year, we got our new targets through experimentation, and they were astronomical. They were more than double what we had ever been able to hit before.

Clearly, leaders of our company wanted to challenge us. And at first, we were shocked. But then we sat back and we said, "Well, how would we really do this?" And the truth was, under our current tooling, our current process, our current approach to experimentation, it just wasn't possible. In order to achieve that, we were going to need to do something big. And anytime we want to do something big with scale, we think platform. A lot of the times, platform is the route to scale. And so we reached out to our DX partners, our Journey Optimizer partners, our Target partners. We reached out to research to learn about the latest and greatest, and we all started collaborating. And we started with doing a full end-to-end study on our experimentation process. We wanted to identify the biggest barriers to scale.

And we separated up into four big buckets. We have ideation and test design. This is and many of you might be familiar with this, we are identifying opportunities. We are applying that to hypothesis. And we're organizing it by impact potential to plan our whole year and our whole roadmap and experimentation. The second big bucket is content creation and execution, where you take that hypothesis, you take that strategy, and you turn it into a web page or an actual experiment, you make the copy, you engineer the experience, etcetera, etcetera.

Then the third step is the actual experimentation runtime or the experimentation lifecycle, where this is the actual duration of the test, right? We did a lot of A/B testing on Adobe.com and all of our pages. And it would run basically about 30 days on average. We did that to ensure that we were reaching statistical significant results. And then the last thing, of course, is measurement and reporting, where not only did we do a measurement quantitatively, but we also spent time collaborating and coming up with insights, trying to understand not just what happened, but why it happened. This funnels right back over to the first step in ideation and test design, where we iterate on that result.

We started noticing and defining some really big barriers that prevented us from achieving that massive amount of skill that we wanted to.

In ideation and test design...

We have a lot of data at Adobe made possible, of course, by our platform so that we have things like AEP, CJA, Adobe Analytics. We had so much data. But there was still a problem with scale. Because even though we had so much data, the best dashboards, the best logic in the world, you still have to take that data and turn it into something. So turn it into a strategy. And that part was very manual and it was very error-prone. We had 30% success rate with our experiments even though we were such a high performing team, which is low, right? From a content creation and execution standpoint, you could have the most perfect hypothesis in the world. You might have nailed the biggest opportunity for your space. But there's still a problem because you have to execute on that. And the content and the actual execution of that really, really impacts your test results. So we get to the end of test and it would be like, "Man, this didn't perform very well." Was it because the content wasn't right? Or was the strategy wasn't right? And this continued to affect our ability to get a great ROI from our experiments and a success rate.

Then through the experimentation, runtime, or lifecycle, we did a lot of A/B testing at Adobe. And we had many ML tests that helped us scale, but it was very use case specific. And one common misconception that we had was that the problem was that the technology wasn't there.

And that wasn't the problem. The problem was more operational. We have 1.5 million pages on Adobe.com of which we only run a very small fraction of tests and experiments on because it's only practical to do so. A/B tests are a lot more simple. They're easier to report on. They're easier to execute. And when you turn that 1.5 million pages into an ML test and it's dynamic and always learning and going nuts, suddenly, we have 15 million, billion data points that now we need to analyze somehow. We don't there aren't enough analysts in the world to help us do that.

At the end, we do measurement and reporting. And again, similar to the first step...

Turning data into actionable insights that we can actually create strategy on, that we can execute on, that we can learn from is still a manual process that very much relies on the intuition, the skill, the experience of the product managers, the analysts, and everybody who is helping run an experiment.

We started noticing a core problem, a common denominator as a group.

We have an abundance of data, no shortage of data.

But we are limited in our ability to turn that data into something actionable, both before you set up a test, while it's running, and after it's completed.

Because what we really needed to achieve that level of scale at every step of that process was to turn this. This arm had a statistically significant lift, control group of 3% to something more like this, where the audience is price-sensitive and unfamiliar with your brand. They respond to strategic pricing, creative assets. Here are some creative attributes that correlate across your ecosystem with this audience or this journey. And here are some opportunities, suggestions of what to do next.

But instead of having that, it was manual.

And PMs, marketers, and growth professionals, we were all of the ones responsible for connecting these dots between data insights and opportunities. How could we truly know what those biggest opportunities were and how to fully execute to take full advantage of that strategy? It just wasn't practical to do it at a global scale, even internally at Adobe with the best tools in the world.

So we started working with our partners, our research partners, our Journey Optimizer partners, our Target partners.

In my experience, a lot of really, really big game changing ideas start with somebody just sitting around one day and saying, "Man, wouldn't it be great if we could do this?" And that's exactly what happened here. For every step of that funnel, wouldn't it be great if we knew exactly what the biggest insights were in our entire company? Doesn't matter if it happened in my scope, or in another channel, or in another country. Wouldn't it be great if we just knew what the biggest opportunity was at the time? Wouldn't it be awesome if, from an execution standpoint, we knew exactly the core visual attributes, the core strategy, the content strategy that would apply to our hypothesis? And wouldn't it be awesome if we can test many variants at a time with operational efficiency, where we can monitor everything, we can see everything very clearly, and everything is transparent? And from an analysis perspective, wouldn't it be awesome if our analysts and our product managers or test leads had the ability to identify these key factors that made up a success of an experiment pragmatically? So they can spend their time being creative and building more efficiently. The reward of that, if we could do that, would be hearkening back to my earlier story, this exponential huge scale that we wanted to achieve. We would see an increased success rate. We'd win more. We'd get bigger returns on investment. And it would be a lower time investment and less resource intensive.

So again, we went to our platform teams. We're all working together and collaborating on this. And we discovered that, "Hey, the puzzle pieces at Adobe actually did exist already. We just hadn't put them together in the right way yet." We had the ecosystem. That's one strength of Adobe and our entire digital experience platform family. But putting them together in a specific way that we can use in this to solve the scale issue. We had this AI-driven content supply chain technology. I'm sure everybody here has heard a massive amount on that. And it was really revolutionizing the way that we created content at Velocity. We had ML-powered experimentation. Again, Target and AJO, they have features like Target Auto-Allocate, Auto-Target. They had the capability. And new to the scene was some work that was being done in research and being done at Journey Optimizer around AI agentic insight and analysis. For that, I'm going to hand it over to my colleague, David. Sweet. Thanks. Yeah. So I guess, that's a lot of buzzwords. So it's probably worth jumping underneath the hood and say what do we actually mean when we say this? And I promise you there's very little magic, mostly just math. And on that note, I also want to give a warning to say this is the presentation that I usually give, and so this is a little off of my standard tenor. So if it's too high level, track me down afterwards. I think Paul can attest to the fact that I love it almost too much to get into the details. And if it's too low level, just give me some grace. It's the last day for all of us. Okay. And before I get into the AI piece of this, I do just want to quickly highlight some previous work that we've put into production in right now in AJO with research, which is around continuous monitoring with confidence sequences. So already, if you run an experiment in AJO, you can have the advantage of speeding up the test, your ability to come to a decision from the test due to confidence sequences, which is a way that replaces standard P values, that lets you continuously peak while maintaining statistical rigor and validity. So that's great. We can speed up the time to test. But to Paul's point, how do we go past that and how do we scale to many treatments? We can't 10x the number of people we have. So if we 10x the number of treatments we have, we need to be a little bit more clever.

So to do that, a natural thing to do is to think about using generative AI, right? And so, we can think about experimentation and AI in terms of two paradigms. So on the left, we have the two titans of these two things. On the left, we have Geoff Hinton. On the right, we have someone named Jerzy Neyman. And we can think about the pros and cons and why we find each appealing. So first, we can think about why we like AI so much. It's great. Our bosses love that we use the buzzword. But in practice, the reason it's so nice is we get really good empirical behavior on a wide range of settings with little to no examples. This is almost in direct contrast to statistics and causal inference where if I want to make a decision or if I want to make a conclusion, I require myself to have a lot of data, and that can be very burdensome. But I think those of you who are maybe a little more cynical or clear-eyed in the audience may know that this comes with no free lunch because on the other side, the thing that you don't get from generative AI typically is rigorous guarantees, reliable uncertainty estimates, and decision criteria. And when we do experiments in things like causal inference, this is essentially exactly why we're showing up to the table. We need to have the guarantees that we can make decisions on top of such that in a month, if something happens, we need to say these were the criteria we had is based on really rock solid math. So essentially, the work that we've been doing is asking how we can marry these two paradigms. So what we've done is take statistics and machine learning that's explicitly designed to leverage GenAI based inference while providing guarantees that are no worse than their classical counterparts. So that's a lot of word salad. Another way of saying that is, you make some assumptions when you use generative AI. And if those assumptions are wrong, your worst case behavior is you're running the A/B test you thought you were running when you showed up. So if you would, basically, what you would have been doing before. But if it goes right, then you get to have an increase in precision, an increase in the ability to do these things. Okay. So put this more into context and maybe hearken back to Paul's notion of what that cycle means. We can think about the experimentation process in three distinct phases. The first one is insights, right? This is essentially, "Hey, I should run an experiment," right? We come up with a set of things we think we can measure, right? These are the ins-- Basically, what marketers get paid for, right? You all have great ideas about what's going to change. Then you can think about opportunities. And opportunities is exactly what you're going to change and how you're going to change it. Another way of saying is these are the variants that you run-in your A/B test, your bandit, whatever. And then finally, we have performance. And performances, I know what I want to run. Give me a really precise estimate and give it to me as fast as possible, so I can run my next experiment quickly, so I can make this decision and get on with life. Okay. So with these three things, you may be asking, "Okay, how are we actually changing how we run these things? What's fundamentally different here?" I think it's really useful to anchor on what we typically do in A/B test, right? So when you typically run an A/B test, so here and throughout the rest of my little section here, I'm going to use images because they look nice on slides, but just know this also applies to things like text and layouts. It's just, I didn't want to bore you guys in the last session of the day. But let's say you're running your A/B test. You have these three hero images. You treat these things as independent. Another way of saying that is you're assuming you have no information about these things other than the fact that I have treatment A, B, and C. Okay. But if I asked you which of two of these are going to perform similarly, right? You're probably going to do something fundamentally different, which is you're going to rack your brain. Maybe you're going to hearken back to your early days of television. You're going to say, "Hey, some of these things look more similar than the others," right? And so what we can take that same intuition and operationalize it with generative AI and embeddings in order to start thinking about how we can relate similarity between treatments so we can explicitly leverage those similarities and make our treatments more efficient when we go to run the experiment. But more importantly, use that similarity information about our treatments to draw insights and help us make better hypothesis. Okay. So let's do that a little bit more concretely. So here, let's imagine that I have two embeddings, dimensions. One is how stretchy things are. One is how bright things are. And now what I'm going to do is I'm going to take all of my variants. I'm going to place them in the space. Okay? And I can do that for a bunch of stuff, right? And we know we do this all of the time, right? And one of the key things here is that we can generate a ton of these semantic embeddings or meaningful embeddings. And really, what we're doing is we're just taking that same insight that you bring to the table when you think about, "Hey, what's going to work, right?" This is the exact same logic that you're doing. The only thing we're doing is making it mechanical, and pushing it through the machinery of statistics. And once we have these things, something very subtle changes that's really important. So instead of saying how are these variants doing, instead we're essentially looking at the space and saying, "How are stretchier things doing? How are brighter things doing?" So we're not thinking about discrete treatments anymore. We're thinking about the attributes of their treatments and how they're affecting the things we care about.

Which means when we want to add a new treatment with this dapper young man here, we can go back to our embeddings and we can say, "I think it's actually going to perform very similar to other stretchy bright things." Let's just do a couple things. If we go to run the experiment, we can partially pull information from other similar treatments in a way that increases the power, improves the precision of that, lets you finish quicker. But also gives you a sense before you run the experiment, "Hey, this is maybe more likely to do well because we know that other stretchy and bright things tend to have done well." And we can also flag things like, this is like the 15th time you've run this same yellow zip, right? Which sounds silly, but I think across organizations a lot of times you'll look at historical datasets and you'll see maybe one word changes, comma's different. And so it helps to make that a little bit more explicit here. Okay. So now that we have that, we can think about other ways we might think about what a good treatment is. So I could ask you, just quickly, rack your brain, like, "What's a good treatment?" I'm going to give you our thoughts on what that is. So the first one, like I mentioned is, have we run something similar to this in the past, okay? We can also think, "Are there similar things that have or haven't worked well?" Again, thinking about those attributes, those categorizations. But we can add additional information here, does this fit our brand guidelines? Seems fundamental, something probably we want to think about before we run it. Does this fit our company goals, right? Has another brand run a similar experiment, right? And maybe the most important question that you could ask before running any experiment is, is my boss going to like it? So we can operationalize a lot of these things. Unfortunately, we can only implicitly model the last one, but I think you all are smart and can come up with some surrogate metrics.

And what we do is we take those surrogates for each one of those things. So you see on the left here, you have things like converting attributes, the previous learning. So that's explicitly taking your previous experiments and leveraging them in dataset to use it as opposed to just keeping that locked inside of our brain. What metrics do we have available? What are our company goals? We can write that down and use it inside of our model, and also our best practices, both industry best practices and also internal best practices, can put this inside our AI experimentation opportunities agent, which is a bit of a mouthful, but bear with me. So our AI and stats model. And then what we get is a couple things. One, we see a hypothesis here, right? So this is the treatment that you're going to try. I've switched over to text because I like slice of hand. So to say this is a headline here that maybe Paul might be running that says, "Upload creativity in your apps today," right? But you'll notice something else, which is why this might work. And this is explicitly using those attributes that I just talked about and your prior historical data, your previous experimental results in a way that makes explicit gives you a natural language explanation of the hypothesis that you're running. The headline with a neutral tone, that is short and direct would increase sales. I think a lot of times we do this, but it's useful to write it down explicitly for two reasons.

One, when you're generating things with a model, it's nice to know why you're seeing what you're seeing and you're not just seeing 100 variants at once because that's both overwhelming and sometimes just feels like white noise. And the other version is you get to keep this, so later when you go to look at your results, you can know why you started it, right? A lot of times we forget why we thought good ideas were good ideas. Okay. So we have those things. Now we're going to run it. Say we give you some examples. You're super excited. Now what do I do? All right. I'm not sure how streamlined and well-documented all your systems are, but most places that I've seen and been have the following process. You take your results. You hand it off to an analyst. They disappear to a dark room for weeks, months, maybe the quarter. You send them some really panicked emails saying, "We got to tell people what happened soon." They come back with a presentation, usually very high quality because you like them. And they say, "Hey, this is what happened." This is why it happened, right? You guys brainstorm, and this is what we should do next. Awesome. You're super psyched. And then what do you do? You put it into cloud storage and then it dies. And then maybe four months later, someone says, "Hey, we're thinking of the experiment." You're like, "We actually run that exact same experiment four months ago or maybe not." Like, maybe just forget and you just run the same thing again. So one of the things we'd like to avoid is this because what you're really doing is you're losing the basic value of experimentation here. You don't really run an experiment to find out what's going to happen in one iteration. What you're really trying to do is build a base of knowledge and understanding based on experiments that'll let you inform your strategy and your intuition, not just for you, but organizationally. So you can send it to your director or whoever, and you all can learn together in a way that that feels based in science. Okay. So to do that, we use insights. For our purposes, we're going to define an insight as a human readable statement of evidence that helps to update and refine a hypothesis. Again, I really like long statements, but I can say this in a shorter way. This should either help tell you what to do next or it should help you explain what you just did. If it's not doing either one of those things, it's not an insight that just works. So this is what we're trying to-- This is what we do in our product. How do you find these insights? We can go back to what is quickly becoming my favorite graphic, and we can think about this space that I just showed you, right? And we can find places where things did very well. I have circled in green. We can find things where things did poorly. We can look for those light and bright spots, and we can look at that across all of those categories. And now we have attributes that are associated with good and bad performance. And then we can take that and surface it back to you and say, "Hey, these are the things that are working well." Okay. So what does this look like? Let's think about three different levels of complexity here. So the first one is probably what we all do, hopefully, when we run an experiment, which is that, this, there's strong evidence that exactly the experiment I ran worked. Great. So this correlates to a hypothesis that we started with, which is that we think that bright-colored products conferred better on the home page. Wonderful. We get a little more nuanced, though, and we can think about an insight that narrows us on specific attributes that I talked about. And now we're thinking not just about that specific experiment, but also what it is about the content that you're actually experimenting on that worked or didn't work. And then finally, we can think about going down another level and not just talking about the attributes, but also your audience. Who is this working for, right? And we can think of the intersection of these things, right? These are the specific strategies or attributes that work for these segments of my customer base.

And then over time, what you do is instead of just having a bunch of experiments that you ran and threw away, you're building up an insight base. So you can learn from these organizationally. You can surface this to other people. And also, when you go to run your next experiment, like I talked about before, we had that ranking mechanism. And that's based on exactly that saved knowledge that we talked about. So every time you run an experiment, you're not just running an experiment, you're making the next experiment better and improving your decision-making process. So now that I've rambled for a solid 10 minutes, I'm going to hand it back over to Paul where you can-- Before I do that, I can't help myself. One second. I want to give you a quick moment under the hood, which is just to tell you that this really isn't smoke and mirrors. Here are the, so the broad technical ways that do it. So I really can't help myself. I did. The first is, we use embedding generations that are two levels. So we use embeddings, and we also use something called novel method called semantic embeddings. So we have embeddings that I was talking about before that relate to specific attributes that we care about. Then we have a hypothesis and ranking selection model...

That that leverage embeddings in prior experiments. We do a little bit of work there under the hood to account for things that might bias specific experimental results like, "Hey, I ran this in Christmas," that kind of stuff. So a little bit extra statistics. And then finally, we have hypothesis description and characterization, which is a process where we come up with attributes that both that we define and also that customers can define as well that allow us to have a bank of attributes that we're really trying to reason over. So with that, I'll finally hand it back over to you.

Thanks, David. Okay. So I'm sure everybody here is thinking, wow, that makes a lot of sense. Because it really does. It's actually not that complicated. I mean, it's very complicated. But at a high level, it's not very complicated, right? But there's always this disconnect that I noticed, which is like, "Okay, we have a really cool thing. But how do we take that into actual practical use?" Right? How do we work that into our day-to-day, our existing experimentation process? And again, remember, we're trying to achieve in this year, in my story, an immense amount of scale quickly. And my team, the personalization platform team is extremely passionate about putting them into practical use, validating, incubating that quickly. So this brings us back to about a year ago, actually. We actually started building an internal tool at Adobe.com with the Journey Optimizer team that would resolve some of these big user problems. And we started sneakily using it, right? And in all my time of being at Adobe.com, as a product manager in the growth profession, I have never seen anything as enthusiastically and overwhelmingly adopted as quickly as the work that we did in this internal tool. And so to give you a better idea of how exactly we're doing this and, spoiler alert, this ends in a new product for Adobe.

I'm going to show you a case study of an actual experiment that we ran within Adobe.com This is the Creative Cloud overview page. That's what we call it internally. And we would typically do, this is a very high performing page. A lot of people come through this and it's a big marketing page and generates a lot of revenue for the company. And typically we do A/B tests on this page, right? We might say the marquee. The marquee is a really high visibility page and we really want-- We think that's a massive opportunity. So we do an A/B test. But is an A/B test really what we want to do? So not really. That's the method that we're trying to do what we want to do. What we really want to do is we want to find the most performant header. We want to find the most optimal subhead. We want to find the most engaging CTAs. And we also want to understand, does that differ by audience? Does the user behavior between audiences differ? What they respond to? The answer is probably yes. So remember back when I was talking through the entire timeline here? This, at minimum, 30 days. To get a clean read, 30 days for the header, 30 days for the subhead, 30 days for the CTA. Now you're at a full quarter and you've used up a quarter of your roadmap. And on top of it, if you layer on audiences, now you times that by four. Now you're at 360 days of not only runtime, but 360 days of collaboration across multiple, multiple teams extremely, extremely expensive and not particularly scalable.

But now, using that internal tool that we had developed...

We had tools to do this.

This Creative Cloud overview page tests, instead of defining an A versus B, we utilized technology from Target's Auto-Allocate to define a pool of content. We were able to say a wide breadth of strategies. Machine, tell us which one of these works organically and report back to us for each one of our audiences. And unfortunately, this is not actual data, but just from a confidentiality perspective. However, I can tell you that this was a massive win for the company. This was already a pretty mature surface for the company, and we had double digit revenue growth from this page. We vastly increased the business impact, but more importantly, we vastly improved the user experience. It was more personalized and more relevant than ever.

Now here's the really special part that relates it back over to what David was talking about.

In addition to that, we were building an agent with Adobe Journey Optimizer that reports back to us in a UI. So not only are we able to use ML, but we're able to operationalize it.

It continuously listens. It suggests what to do next and why an experiment or why an experience is performing the way it is by connecting those points between performance data, quantitative data, and the AI work that David did. So combining strategy with data, something that was typically done very manually.

And we did it on a grander scale.

This is the real magic here, something that was literally impossible before. Not only does the experiment agent continuously listen to your experiment in your space, in your context, it listens to the entire ecosystem of Adobe Journey Optimizer, of AEP of Adobe Analytics that Adobe is on. And it's able to understand, what happened and what worked in email? What happened in our in app surfaces on Photoshop? Because the next big thing for you and your focus, your line of business, it might be from the page in Japanese on the photography page from Japan. And that insight gets now organically worked into the experimentation process...

Where in our UI we get it surfaced seamlessly in the experience, something that we can explore, we can judge as humans who are experts in the space, and execute on very, very quickly.

And like I said, this was about a year ago. And it had an incredible amount of adoption at Adobe.com Throughout the remainder of the year, we ran many, many tests. We basically ripped out every possible personalization and content tests that we could to transition to this because it was so much more effective. We saw through our AI-driven content supply chain over 200% average increase to the amount of content we could create. We saw 24% relative increase to our win rate or success rate and 212% average ROI per test. In our case, we are focused on revenue growth. So put in plain English, we were able to test far more with the same amount of resources. And we were able to not only test far more, but when we did get a winner, it was a larger winner and created more business impact.

So we're all clinking our champagne. And by the way, we hit our layer that year, so that was nice. And we're all sitting around patting ourselves on the back and we thought about something. What if like other people have the same problem that we resolved internally at Adobe? And so we with Journey Optimizer...

We got together and started talking to some people and I think there's a couple people in this room that we talked to and decided we were going to collaborate on a new Adobe product. And I've been looking forward to the last year to stand up on the stage and get to tell you guys this. So introducing the Adobe Journey Optimizer Experimentation Accelerator.

This is a new GenAI-first product that is under the Adobe Journey Optimizer banner.

And using this application, you can do the exact same work that we just described, taking advantage of the experiment agent and all of the workflows that we've been using in Adobe.com It helps you design the best possible test. It helps us design the best possible test by automatically surfacing and correlating those attributes that correlate with performance in your context. We were able to test more variations at a time. And you can do that now here. And again, we have that technically. But I will soon show you that it's more operationalizable than ever.

In addition, the experiment agent is plug and play and continuously analyzes, connects the dots between data and strategic insights in plain English so you can execute on them. And it identifies the biggest opportunities dynamically, learning from your entire ecosystem and applying them strategically to what you're working on.

And one of the coolest parts, in my opinion, is that it seamlessly integrates with both Journey Optimizer and Target.

It doesn't take any additional integration. It works out-of-the-box and communicates with those. You're basically just plugging and playing the experiment agent and the UI into whether you're a Journey Optimizer customer or a Target customer, you'll have access or you'll be able to get access to this functionality. Okay. That was a lot of preposition, but we're going to do a demo now.

So at Adobe.com this is the Experimentation Accelerator. This is our home page for product managers, for marketers, for analysts, anybody that touches experimentation and wants to take advantage of these features. Up top, we have a very high level view of the metrics that are driving our business that we really, really care about.

Down below here, we have additional metrics, L1, L2 KPIs that we track again measuring business.

Down here, you can see that there are some new opportunities being surfaced organically in this process. I'm going to hit that in a moment.

The experiment agent is recognizing big milestones, important things in this section down here that might be relevant to you that you should probably check out because it's driving a lot of business or it's a major milestone in your organization. And down here, again, another browsable, another discovery section, you're able to see all the top performing experiments. And we could see all of the experiments running on Adobe.com now. But we are not just any Adobe.com product manager or test lead. We're actually at Adobe. We're going to pretend like we're a Photoshop product manager. And so we can switch to what we're categorizing as a team space. And so this is customized just for me being a Photoshop practitioner. And the experiment agent is doing separate things here. That is analyzing everything, the same as all, but it's really applying it to your context. So this can be your home page for the things that you really are focused on.

Let's actually click into one of these core metrics here.

On the core metrics page, we see a high level view of how the last year is going. Again, the experiment agent is recognizing big milestones and strategy and some things, not just strategy, just some regular, utilitarian things, like a test was launched or a test won. We see a list of experiments year to date that are impacting this metric that you can click in and browse. We're going to look at that in a moment. And over to the right in the Opportunity section is where a lot of the magic is happening. Because organically, in your process, the experiment agent is recognizing when there is an insight or when there's something critical that could be your next big test.

If we click into that, we can see details about this insight. And we try to be as transparent as possible, which is absolutely critical.

We call out where this insight came from. We call out not just what the insight is highlighting Photoshop's distraction removal feature, but we also the experiment agent also explains why it thinks that this is working. So you can make the judgment call as a practitioner yourself. Down here in the Improvement this quarter section, this is basically an aggregate, an average of what this new treatment, what this messaging strategy, how it on average impacts the business metrics. And in this case at 3%. And I might have just logged in here. I saw this opportunity. It's like, "Man, that's a really great idea. I want to add this to my experiment." And you can click it here.

So this might look different for every organization. But some people have separate teams. They have MarTech teams. They have an engineering team that you might want to hand it over at this point and add different pieces of content to take advantage of this opportunity. Maybe you are an AEM Sites customer. But we're also building the capability, and we're using a prototype version of this internally at Adobe that allows you to generate and brainstorm and ideate different potential executions. And this is a copy test, but it could do imagery and layout and everything. And we find ourselves using this as a really critical brainstorming tool. It gets us over that cold start creative process. But you don't have to use it. In this case, we're going to use it. And we're going to click Use this content. Regardless, it shows up organically inside of AJO. That handoff is seamless. And now you've just added something to your new pool of content.

So let's go back and imagine that some time has passed...

Maybe two weeks or so. Down here in the Notification section, we see that our test has a new treatment that is winning. This is what we call the experimentation or the experiment detail page. This is our experiment. This is a point of a journey, perhaps, for a specific audience. And that relates to the content that we just added. Down here when we scroll down, we can see a leader board. I'm a huge motorsports fan. And I have this open every single day on my desktop for the things that I'm running. It's actually fun tracking and seeing what the AI is doing and why it's doing it. And that sounds magic, but it's really not that complicated. The AI is slowly funneling more users into the higher performers and slowly funneling and gradually eliminating content or eliminating variations in your experiment that are not performing well. It's as simple as that. And it does it with statistical significance, of course.

Again, if I wanted to say add another thing to continue this process, I can go into the Opportunities and add more. This creates that really tight scale that we struggled with so hard and also very seamlessly integrates AI agentic work into our experimentation workflow in a way that doesn't feel separate, you don't have to take a specific action for, it's organically understanding your context and suggesting things for you. This creates a loop that we're taking advantage of. And by the way, I showed an example of one of those ML tests that we're working on, but it also supports A/B testing in the same way. So yeah.

Going back to the presentation, some final thoughts.

One thing that I love about this concept and one of the reasons why my colleagues are so excited about it as well, is this was born and bred as initially some really small internal tool. It was built to solve a user problem, a real one that we're going. And we didn't start like, "Let's build something with AI." We didn't start here at all. We had a way less inspirational goal. We were like, "Man, we need to hit our targets." How are we going to do that? We need another tool. We need something to help us do that. And we just so happen to have stumbled upon the agentic AI stuff that we organically worked into a process that we knew that we needed. And that collaboration was only made possible by the Journey Optimizer team, by the way, which we have worked very closely with them on all of this.

The new loop is very strong within Adobe.com We are able to design the best possible test now. We can quickly take action on those big opportunities. And they're automatically identified and surfaced for us at scale. We can execute with a lot higher confidence than before, with the experiment agent calling out strategic and attributes that correlate with performance that can help you and your strategy specifically succeed in your environment, we're able to execute much more effectively and collaborate with our design teams, our copywriter teams. We can test more in less time. We have this beautiful UI by the Journey Optimizer team that helps us monitor these tests, helps us keep track of them. It's actually a delight to go in and look at how a test is doing. We can also analyze on this global scale. A lot of the heavy weight is taken off of the manual. Let's attend every single meeting we can and write things down. Hopefully, we don't forget them.

And that is creating this next generation loop within Adobe.com where we have this radically scalable made possible by that GenAI enhanced experimentation.

Okay. So I think we did pretty well on time today. And one thing that I do want to note is that we are soon going into beta here. And if anybody is interested in taking part, there's a QR code here where you can sign up for the beta.

There are two requirements. The first requirement is you must have Target or Journey Optimizer. You must be a customer already. And you must be open to giving feedback as well. But even if you're not ready to sign up for the beta, we're always open to talk and chat. David and I are massive nerds clearly. And I just-- We all love talking with Journey Optimizer team looking at them too. And they're also insane nerds. So if you want to talk and just nerd out for a little bit, open to doing that too. And taking your feedback is always appreciated as well. Feel free to reach out. Sign up for the beta. Even if you just want to chat and brainstorm and nerd out with us, open for that too. And we'll be here for a little bit longer and in this room and then we'll go downstairs to the booth that's there. Feel free to visit and let's talk.

[Music]