Skill Exchange: Adobe Experience Manager Security in the Age of Generative AI

[Music] [Krishna Kalyan Gorthi] Good morning, everyone.

Thank you for joining me today. I know it says Bash yesterday, so I know it's early. It's 9am, and it's very first session of the day.

So let's kick things off with some energy.

Thanks, man. Thanks.

We just heard the song, Eye of the Tiger.

The song about resilience and strength.

And honestly, that's exactly what we need when we're tackling security in the age of Generative AI.

We need to stay adaptable and fast and ready for everything.

I'm Krishna Kalyan Gorthi, currently working as Principal Software Engineer at Palo Alto Networks, a global cybersecurity leader.

I'm in the Adobe Ecosystem for the last 14 years, specializing mainly in Adobe Experience Manager.

I'm honored to be a Adobe Experience Manager Champion and a Community Advisor, which helps me learn, share insights, and collaborate with broader ecosystem like all of you sitting here.

My experience spans across multiple industries like healthcare, retail, cybersecurity, and consulting, where security is a top concern.

Over the last eight years, being a cybersecurity industry, security has become paramount to me and center to everything I do, and it's shaping every decision I make.

Today, we're going to explore how Generative AI is transforming AEM and the security risks and vulnerabilities that come with it.

I'll walk you through key mitigation strategies and key takeaways, best practices to strengthen your security and wrap up with some resources and key takeaways to help you stay ahead in this area of landscape.

So let's dive in.

Generative AI is rapidly transforming how we create, manage, and deliver content in AEM.

It's no longer an emerging trend. It's fastly becoming a core tool for content creation, automation, personalization, and customer interactions.

In AEM, Generative AI can be used in multiple key ways.

It enables automated content generation and helping content to others and marketers to scale their content production efficiently.

Search result summarization enhances the user experience by delivering concise, relevant information instantly.

With hyper-personalized dynamic content creation, AI tailors experiences based on user behavior, making interactions more engaging.

This is something you have seen with Coca-Cola this Summit where they are doing so much of hyper-personalization based on the user behaviors.

And conversion interfaces like AI chatbots are reshaping how customer interactions are made by providing real-time and intelligent responses.

While these capabilities unlock incredible potential, they also introduce new security challenges, which brings us to key security risks we'll explore now.

As Uncle Ben famously said, "With great power comes great responsibility," this is especially true when it comes to security.

Let's explore some key data security risks associated with GenAI.

The first one, data privacy risk.

AI can unintentionally expose sensitive information. For example, if an AI model is trained on internal documents without proper safeguards, it might reveal responses which are critical to our business like business strategy or company IP documents or personal customer data.

The second one is injection attacks. So attackers can manipulate AI to execute harmful actions in this case. For instance, a malicious user could craft a prompt that could trick the AI systems into generating automated, unauthorized commands, or disclosing sensitive information, exposing the vulnerabilities in the models prompt handling. We'll explore more this later. Access control weakness. Without seek permissions, AI tools might access or modify your content beyond their intended scope. For example, AI system designed to assist in content creation might be exploited to alter system configurations or accessing restricted data if not properly secured.

So we have explored until now some of the key security risks that GenAI introduces in AEM. And each of these could compromise content integrity, user trust, and overall system security.

Understanding these risks is essential for secure AEM environment.

So how do we defend against these threats? That's what we're going to explore next. Now let's take a deeper look into potential vulnerabilities and mitigation strategies to see how we can implement safely, Generative AI in AEM.

The first one is prompt injection. Most of you might know it, might not know it, but prompt injection is one of the significant vulnerability that is out there, which we cannot overlook, especially when it comes to Generative AI in AEM.

So what is prompt injection? Prompt injection occurs when an attacker manipulates AI-generated content by injecting harmful inputs, leading to unintended actions, data exposure, or security bypass.

Since AI models heavily use user inputs, a poorly secured implementation allows malicious users to influence AI model behavior in unpredictable ways.

So now that we have defined the prompt injection, let's take a look into how it impacts your systems.

So prompt injection introduces security and operational risks, which are very severe.

It can lead to unauthorized content generation or SEO manipulation impacting publishing workflows or brand credibility.

Information leakage, user data exposure, and credential theft further amplify data security risks, potentially exposing sensitive information.

Attackers might exploit component manipulations and resource exertion, disrupting system integrity and affecting platform stability.

Additionally, chatbot manipulations, personalization errors, and search result manipulations degrade user trust and experience by delivering harmful content.

These are just risks. These risks might migrate to a compliance or security threat including access control bypasses, cross-site scripting, and server-side request forgery.

These can be exploited to get unauthorized access or leading further attacks.

So to safeguard AEM, we must implement a layered mitigation strategy.

It should be across different layers like application code, configuration, and at dispatch levels.

Let's explore this in the next slides.

So the first line of defense here is ensuring that AI process only safe and expected inputs.

So attackers often try to manipulate AI responses by injecting hidden commands or manipulating prompts. To prevent this, we must sanitize and validate every user input that user uses. So these are some of the code snippets which help you to sanitize and validate user inputs. For example, the first one, XSSAPI. So this is in AEM. This first line of code in AEM, it tries to encode the input for HTML rendering. So by encoding the userInput, we ensure that malicious JavaScript or HTML content input is converted into harmless content, harmless entities for preventing executing any code there.

The second method, escape-- Oh. Yeah. There you go.

I moved too fast, I guess. The second one, escapeHtml. So this we are trying to escape any HTML related malicious content and this one will usually tries to safeguard against HTML characters like special characters. It escapes those ones.

And the third one, we're trying to do a regular expression replacement. Ideally, this contains, whenever somebody is trying to use a prompt and trying to add commands like exec, curl, Python, these are malicious keywords which might harm the AI models or try to configure that.

So by removing this from the inputs, we make it harder for the malicious actors to exploit the system.

So this code is related to HTL where we are using the context rendering, we are using context text, which ensures user input is treated as plain text instead of HTML in order to not execute the code.

And here we are using prompt templates and prompt parameters. So prompt templates are simply a prompt in which the parameters are embedded. For example, a prompt template can look like-- Can you give me some information about restaurants in a country where this cuisine is found? So here the country and cuisine, these are the parameters, and the entire thing is template. So whenever a user inputs something, those are the parameter values, country and the cuisine. Those are the ones which will be taken into consideration, and the rest of the prompt is created by us. So in that way, the user won't be able to create the entire prompt and can hijack the AI models.

So third, so we should always enforce encrypted communication with HTTPS. This will protect from data in transit and prevent tampering.

Next, apply principle of least privilege.

We can do this using ACLs. So this gives AI tools only permissions they need, and scope the admin rights to limit their impact.

Disabling caching for sensitive data. This prevents accidental exposure through cached content. And real-time API requests to control traffic, prevent abuse, and ensure system remains responsive.

Finally, filtering and rate limiting requests at dispatcher level using a deny first approach, query parameter filtering, and rate limiting. This will block malicious requests before they even reach your system.

So in this code, we are rate limiting, in this code, we are trying to rate limit number of requests which are happening per API call. So if an attacker tries to flood your system with multiple requests, too many requests, this will get to certain requests per second. So in that way, we can basically stop requests even coming to your system.

So that's all about prompt injection. Now the second one is improper output handling.

So when this occurs? This occurs when AI-generated content is not properly validated or sanitized. So what we have seen before is the input which is coming into the system or input which is going to the AI models, that we need to sanitize and validate. This one is the output generated from the AI system. So that is also should be validated and sanitized whether it's being stored into your JCR or whether you're displaying to the end user.

If you're not doing this, that will lead to serious attacks might happen. We will see into that what is the impact. So this can result in cross-site scripting where malicious scripts execute in user browsers and server-side request forgery and remote code execution, allowing attackers to manipulate back-end systems.

Additionally, SQL injections can compromise databases leading to data leaks and unauthorized access.

AI-generated output may also expose unpublished content and introduce hallucination information, jeopardizing content integrity and user trust.

Proper validation and encoding are critical in mitigating these risks.

First, we use the same thing, XSSAPI for filtering the output coming from the Generative AI models. So this is our first line of defense for cross-site scripting. So any AI-generated content that gets displayed needs to be sanitized properly. And AEM XSSAPIs help ensure no malicious scripts seeks into our code.

Next one is JCR validation.

So we don't want to dump everything that AI model creates into our JCR repositories. By enforcing validations before storing content, we prevent injection attacks and make sure only expected safe data is persisted into your JCR nodes.

The third one is HTML encoding. Here we're trying to use context script token. This is also one way, like, AEM already supports this context-based rendering. So we can use this one to dynamically use the context. So we need not do anything specifically. AEM does all it for us.

So the fourth one is like CSRF token. So AEM's Granite CSRF Filter ensures AI-generated interactions or API requests are aren't exploited by unauthorized tokens, unauthorized actors.

So this validates every request, making it harder for attackers to manipulate the system.

Finally, dispatcher filtering rules. So this one acts as a security gate, filtering every request before they reach your system.

A deny-first approach, sick query filtering, and rate limiting helps block unauthorized access and prevents any data leaks.

So by implementing all these layers, we ensure that AI-generated content remains secure, properly validated, and free from vulnerabilities before it reaches our systems.

The next one is excessive agency. So this occurs when AI is granted too much autonomy, allowing it to generate, modify, or publish content beyond its intended scope.

So this can lead to unauthorized content generation where AI creates and alters information without oversight, potentially causing compliance violations if sensitive or regulated content is mishandled.

Additionally, unchecked AI actions can result in brand reputation damage, especially if misleading or harmful content is published.

From a system this integrated perspective, excessive permissions could allow AI to alter configurations impacting overall stability of the system. Finally, data privacy concerns arise if AI gains access or even exposes that information, confidential information of your company due to inadequate access controls.

So how do we mitigate them? So to mitigate this risk, we need strong safeguards in place. Let's go through them. So to mitigate the excessive agency, we should start with restricting the LLM tool permissions. This we can do it with AEM service user mappings. This ensures that AI has access to only specific paths and services preventing unauthorized access.

Next, sling referral filter. This helps control which domains can send requests to your AEM, blocking unauthorized external resources from ever influencing your AI-driven processes.

Now let's talk about sensitive information disclosure.

So what is this? So this occurs when AI unintentionally exposes confidential data.

That can be personal information, credentials, your company IP. So this can happen due to improper handling of the data or the security gaps within your system.

So that can lead to data exposure, compliance risks, unauthorized access risks, and brand reputational damage based on the use case.

Proper safeguards are essential to prevent leaks, product sensitive information, and maintain trust of users and stakeholders.

So to address sensitive information disclosure, we begin with input and output sanitation. So if you have seen from the last few slides, everywhere the sanitation of input and output becomes very paramount here.

Everything which user inputs and everything AI model generates outputs should be sanitized and validated. So ensuring that these are carefully valid to prevent any unintended leaks.

So this is complemented by giving strict data access controls using AEM service user mapping to limit AI access to only necessary content repositories, reducing the risk of unauthorized access.

Finally, we can use encrypt sensitive data using AEM's crypto support service. This ensures the credentials and PII process by AEM models are securely encrypted, further predicting against any potential exposure.

So the next one is unbound consumption. So this occurs when AI models try to consume excessive resources. It can be either direct or indirect-- It can be either due to unregulated inputs coming from the user, like malicious user unintentionally, or inefficient configurations which are not gating your users, leading to system strengths.

So this can cause content delays and performance degradation. And in some severe cases, it can also cause general service attacks where the system becomes unavailable for the legitimate users.

It can also lead to service disruptions and significantly increase your operational costs as more resources are consumed than necessary to handle the workflow efficiently.

So to prevent this, we need to adapt strategic measures and limit their resource users.

So to address unborn consumption, we can implement a few key strategies to help control resource users.

First, the API rate limiting. So this can be done at the dispatcher level, which helps preventing resource intensive queries, flooding by enforcing a cost-per-call boundaries, ensuring that no request overloads your system, entire system.

We can also do input validation filters to neutralize the variable length of the input floods, or you can do infinite loop items by protecting the system resources from being drained.

So next, the resource quota limits. So we can set resource quota limits via OSGI configuration, ensuring that hard limits on number of processing threads are used by AI models. This helps in preventing overconsumption as well. Lastly, output token limitations. If you are using ChatGPT in the initial days, you might have seen that the amount of content it generates is very less. It cannot generate your entire website because of the number of tokens it uses. So that is one of the ways you can dissect how much of AI content is used. So every response it gives, it costs you something. So by limiting number of tokens, which AI model can generate, will help you cap the output to manageable levels, ensure that your system is not getting overrun with lot of questions and responses back.

So that's not the entire exhaustive list of vulnerabilities out there and mitigation studies, but there are some of the key ones which you can actually use and understand what those are and how to secure them when you are trying to use Generative AI applications in AEM.

So now let's talk about some best practices. So even before talking about best practices, we need to acknowledge something.

No system is immune to attacks. Security gaps can expose sensitive data, disrupt operations, and also compromise user trust. That's why a layered approach is essential. So let's break it down step by step here.

So everything starts with data security. So if an attacker gets your sensitive credentials, it's game over. So the first step is encrypting stored secrets using AEM CryptoSupport, ensuring that even if data is accessed, it remains unreadable.

We should also enforce HTTPS to prevent interception during transmission. Securing data at rest and in transit is very fundamental.

Now encryption is useless if wrong people have access to it.

So that's why we need strict role-based access controls to limit permissions. Not every user system or an AI model should have full access.

We should enforce OAuth and API tokens, ensuring only authorized systems can interact with AEM.

And we rotate these tokens regularly to reduce the exposure of the risks.

Even with strong access controls, attackers will still try to probe weak points. So this is where we need to minimize the attack surface. For example, limiting the JSON rendering, prevents the DoS attacks, denial of service attacks, while X-Frame-Options headers prevents click jacking. These small changes makes a big difference in reducing the exposure of the attacks.

Of course, prevention is not enough. We need visibility. Real-time monitoring with AEM's Log Message Analyzer helps us spot anomalies as they happen.

The AEM's Operations Dashboard gives a system wide view allowing your team to react whenever the security threats happen before they cause much more damage.

And security is not a onetime fix. It's an ongoing journey. So if you are not updating AEM regularly, you are leaving the door open for existing known vulnerabilities. So applying service packs and hard fixes promptly ensures we are always predicated against latest threats.

Beyond updates, we must secure AEM infrastructure. That means blocking unnecessary HTTP methods in dispatcher and restricting access to sensitive URLs like system consoles.

Why this is important is attackers often try to find any misconfigured environments so that those are the easier entry points for them and if there are any security gaps, they try to creep in through those. So we need to always have our systems secured first before we're doing anything else. So locking down these entry points is very crucial.

So even with the best of security, breaches and failures can still happen.

That's why we need a solid backup and recovery strategy.

Scheduled automated backups and testing restore process ensures that if something goes wrong, we can always recover quickly.

Finally, we cannot just assume we are secure.

The best way to validate our defenses is to do penetration testing. We can use tools like OWASP ZAP and Burp Suite to identify weaknesses before the attack is done. So if organizations have InfoSec securities, they will be able to help you with this.

And regular security audits, they keep your access controls, API security, and dispatcher rules in check. So these are some of the best practices which you can leverage to keep your AEM environment secure.

Yeah. We covered a lot of ground today, exploring risks, vulnerabilities of GenAI and AEM, and the strategies to mitigate them.

But like I said, security is a journey, not a onetime fix. So to help you stay ahead, I've compiled some key resources that provide you with deeper insights, best practices, and actionable guidance.

These include security checklists, frameworks, and real-world examples that you can leverage to strengthen your AEM security for now and for the future. Let's take a look.

So the OWASP Top 10 for LLM Applications, that is the guide for secure AI development. Helping you to understand and address any common security risks in Generative AI applications.

The next two ones probably you're more familiar with AEM and dispatcher security list. These are from Adobe. These are essential while setting up and maintaining secure AEM environments.

The zero trust framework from Palo Alto Networks, this is a framework which helps you understand how to implement zero test security in your organization.

The next one is, this is a fun tool. I just played with it when I was starting with prompt injection in Generative AI. So this is Gandalf. It's from Lakera. So this is an interesting tool to experiment with. And also, it helps you understand what prompt index vulnerabilities you can see when you are trying to use Generative AI.

And the last two links, these are very insightful resources for exploring real-world examples of Jailbreaking in Generative AI.

So as we wrap today's session, let's focus on some few actionable strategies that you can immediately implement to fortify your AEM environment against evolving security challenges, especially in the right of Generative AI.

These key takeaways will not only help you address the current vulnerabilities, but also equip you with tools and knowledge to stay ahead in the future threats.

So the first step in securing AEM systems is understanding where you stand. A comprehensive security audit will help you pinpoint vulnerabilities in areas like content handling and access controls. This proactive approach ensures that you are aware of your security posture and can take informed steps to mitigate any risks in the future.

Security patches. These are your first line of defense against known vulnerabilities. Regularly updating your AEM systems and enforcing best practices like secure API endpoints, dispatcher rules, help you protect against current and emerging threats.

You need to stay up-to-date always for any of this because Adobe provides you with so many service packs and security patches which contain latest security threats and vulnerabilities getting solved over there.

And third one is restricting access based on roles. This is a critical security measure. So if you have seen previously in one of the vulnerabilities, I explained this role-based access controls is very important. By applying the principle of least privilege and managing access carefully, you ensure that only right users and systems and AI tools can interact with your sensitive data. So not everybody will have every access.

So AI systems are only as secure as the inputs they receive. Implementing structured parameters prompts and denied first filtering ensures that only safe and valid data is processed. This is crucial for preventing malicious actors from exploiting input vulnerabilities such as prompt injections. So if you are seeing the pattern is always-- You need to sanitize your input which is coming in and also the output generated from your AI models.

So with the rise of AI-generated content, it's important to keep a close eye on AI activities within your system. Tools like AEM Log Message Analyzer allows you to detect unusual behavior and anomalous where regular penetration testing keeps you prepared for new vulnerabilities. So continuous monitoring for staying ahead of the emerging security risks is essential.

So these strategies, not just about reacting current threats, but proactively securing your AEM environment for the future.

By implementing these best practices, we'll ensure that the system remains resilient in the face of evolving challenges.

So by this, we finished our presentation. Thank you all for joining today. I hope this session provided you with valuable insights in securing AEM in the age of Generative AI.

Before we move to Q&A, I'd like to take a moment to thank Will from Adobe and Michelle and my colleagues here from Palo Alto Networks who are here today. Your support means a lot. I'm happy to take any questions now. Thank you.

[Music]