Contextual AI: The Next Frontier of Artificial Intelligence

These four principles form the building blocks of a successful relationship between humans and AI.

Artificial intelligence (AI) is powering more and more services and devices that we use on a daily basis, such as personal voice assistants, movie recommendation services, or driving assistance systems. And while AI has become a lot more sophisticated, we all know the situations where we wonder: “Why did I get this weird recommendation?” or “Why did the assistant do this?” Often, after a restart and some trial and error, we get our AI systems back on track, but we never completely and blindly trust our AI-powered future.

One or the reasons for this distrust is that most current AI systems operate as a ‘black box’, with limited interaction capabilities, human context understanding and explanations. These limitations have inspired the call for a new phase of AI, which will create a more collaborative partnership between humans and machines. Dubbed Contextual AI, this new technology is already getting multi-billion dollar investments. Contextual AI is technology that is embedded in and understands human context and is capable of interacting with humans. In this article, I’ll explore how contextual AI works, how it compares to previous phases of AI, the challenges we need to overcome, and the progress we’re making at Adobe.

Contextual Artificial Intelligence: The building blocks of a successful relationship between humans and AI

Contextual AI does not refer to a specific algorithm or machine learning method – instead, it takes a human-centric view and approach to AI. The core is the definition of a set of requirements that enable a symbiotic relationship between AI and humans. Contextual AI needs to be intelligible, adaptive, customizable and controllable, and context-aware. Here’s what that looks like in the real world:

While true Contextual AI doesn’t exist yet, we are getting closer to it. Self-driving cars are a good example: they are a first attempt to understand more of the human context (in this case the road, the state of passengers, or dangerous situations). However, the current understanding is still very limited and narrow. In the 1980s TV series Knight Rider, for example, the car (KITT) demonstrates the principles of true contextual AI, as it was able to interact seamlessly with the driver, understand everything that was going on (and even beyond), and help in dangerous situations. Obviously, it was far-fetched and fictional, but the essence is that contextual AI needs to have a deeper understanding of a human’s situation and be able to interact and explain itself.

What differentiates Contextual AI from previous phases of AI?

Contextual AI addresses many of the shortcomings of previous AI developments or phases. Historically, AI started as handcrafted knowledge. This rule-based AI had no learning capability and was mostly designed by engineers. Think of chess computers (remember when Deep Blue beat Garry Kasparov?) or expert systems. They had their first successful applications from the 1980s to the early 2000s. However, as a machine doesn’t have the same perception as a human, it fell short when a clear specification of the rules, in particular for sensor signal input (audio and video), wasn’t possible.

Statistical learning, particularly deep learning, addressed some of these shortcomings by inferring statistical patterns (that a human might not see or know) from very large datasets and raw signals. This led to the recent success of AI in image recognition, voice, conversational interfaces and many more applications. However, large scale statistical training has downsides as well. For one, statistical models such as deep learning models can be easily attacked or confused. Adversarial examples can be generated and tuned to make a production-grade machine learning system. Minor changes to the pixels in the input image, barely visible to the human eye, can yield very different recognition results. You can even generate your own adversarial examples to fool the algorithm. Additionally, as most AI approaches rely on large-scale data, unconscious bias can creep into AI algorithms based on the (positive and negative) examples with which they’ve been trained.

While the hype around AI is still powered by statistical learning, leading researchers have started questioning the real “intelligence” of the industry’s current AI approaches. While statistical algorithms helped with the context-awareness and adaptivity that is needed for a Contextual AI system, they do fall short on the requirements for humans to understand what is going on, and to customize and control it. A ‘black box’ algorithm cannot be trusted in critical situations. It’s unclear what structures the statistical AI algorithms really learn and whether the algorithms just separate data examples or have a true understanding of the content.

As an AI architect at Adobe, I’m working on initiatives that will bring Contextual AI to our customer experiences. Here are some of the things we’re working on:

Innovating AI with Adobe Sensei

One of the focus areas for Adobe Sensei, our AI and machine learning technology, is Creative Intelligence, defined as the augmentation of creators’ skills and capabilities using AI. Here, the creative human will interact and form a team with the AI, which needs to have a deep understanding of the creative’s intent, background, behavior and needs, and even be able to explain to the human what it does and why. Creative Intelligence is the application of Contextual AI to the creative world.

As stated above, intelligibility and explanation are important aspects of Creative Intelligence as well, which means the AI needs to be able to represent and explain (in layman terms) what it has learned. Technically, it needs to rely much more on knowledge representations and ontologies that represent what is learned. Here are some examples of projects in development at Adobe:

1. Deep learning content understanding

Adobe Sensei’s deep learning technology for content understanding goes beyond just image tagging, and instead aligns with how a human would perceive an image. Looking at the example below, simple image tagging would just recognize three faces, the ocean and the beach in this image. However, a richer taxonomy and representation enables Adobe Sensei to capture human-level concepts such as “Entertainment” and “Family Life” that aren’t as explicit.

Image courtesy of Samarth Gulati.

Emotions such as “Happiness” that used to be entirely in the human realm are partially decrypted by the AI algorithms. This makes the retrieval of specific images much more understandable and customizable by the human. It also enables a richer, more contextual customer experience around image content and search on Adobe Stock, the company’s collection of millions of royalty-free images. As a result, the image search yields stronger results in less time.

2. Image search using voice commands

Another project in development at Adobe goes beyond faceted search and illustrates the natural language refinement of image search using voice commands. In this example below, leveraging images from Adobe Stock, the user is casually interacting with the “search algorithm,” contextually adding and removing search criteria as well as referring to broad human-level concepts such as “authenticity” and “diversity.”

Image courtesy of Brett Butterfield.

The voice assistant tracks where the search is at, and allows various human understandable refinements, including backtracking of search results. Adobe Sensei understands the context, specifically what the user refers to and might look for, and evolves the search accordingly.

Achieving a deeper understanding of human-machine interactions

We’ve come a long way in the journey towards true Contextual AI. We now understand human-level concepts in images, and AI can more naturally interact with the human using these concepts. However, we still require a deeper understanding of language as well as new human-computer interaction paradigms. How should an AI system and humans interact in the future, for example? Through voice, gestures, or even implants?

More importantly, the representation and recognition of what humans think and do is still very limited. For example, millions of creatives use Adobe’s products every day and while we’re familiar with how they’re using our tools as part of their work, we are still working toward fully representing “creative intent.” What does the creative user want to do? What are the steps in the process? And what may he or she require for success? And how could a user even teach an AI system what his or her creative intent is?

Some future technical directions that are currently being explored are explainable AI models and common sense reasoning. How could we teach the common sense of a five-year-old to an AI system? And how could we further make it explain itself and make it fully contextual? At Adobe, we believe that AI enhances human creativity and intelligence when it comes to designing, optimizing and delivering digital experiences (AI doesn’t replace it). Therefore, it is important to leverage the power of Contextual AI to help move the industry forward and harness its power to continually innovate.

These are some of the challenges we’re tackling at Adobe, and to give you a sneak peek of what we are working on, here is a proof-of-concept demo of our contextual intelligent assistant — powered by Adobe Sensei — that enables natural interaction using voice and gestures. Pretty cool, right?

For more on Adobe Sensei, check out our website and read Redefining Visual Search in Adobe Stock by Creating Innovative Image Similarity Technology.

Portions of this article were originally published on Digiday.