Behind the scenes: How Adobe grounds enterprise AI with entity linking.
Siddhartha Sahai and Zifan Liu
06-16-2026
What is entity linking?
Imagine a marketer asking their AI assistant: "Show me audiences with loyalty data?" The question is perfectly natural but also deeply ambiguous — "loyalty data" could refer to a loyalty program tier, loyalty points, or any number of related fields buried somewhere in a complex customer schema. The system needs to understand the customer's context as well as their data. In this case, entity linking resolves "loyalty" to specific schema fields — loyalty.tier and loyalty.points — to return the right audiences.
Getting these questions right isn't just about language understanding. It requires connecting the words a user speaks to the actual entities — objects, records, schemas — that exist in their data. Large language models (LLMs) are probabilistic systems; they predict likely responses based on patterns in training data. But enterprise users don't want ‘likely’. They want exact. That gap — between a model's probabilistic nature and the deterministic precision enterprise work demands — is where entity linking lives.
Entity linking (EL) is the task of identifying mentions of entities in text and mapping them to their canonical representations in a knowledge base or data store. It's a well-studied problem in natural language processing (NLP): research systems like TagMe and BLINK demonstrated how text spans can be matched to entries in structured knowledge bases using a combination of search and neural models. In industry, systems like Google Knowledge Graph and Microsoft's Bing use entity linking to power search and question answering at scale.
In the context of AI assistants, entity linking serves as the semantic grounding layer — anchoring what users say to what they mean in terms of real objects in their data. EL is more than just a useful NLP technique — it is foundational infrastructure required to transform responses delivered by AI assistants from unreliable to trustworthy.
Why does it matter?
LLMs are remarkably good at understanding natural language, but they cannot reliably resolve ambiguous references on their own. An abbreviation like "ca" could mean California or Canada. A term like "specialData" might refer to a deeply nested field in a customer schema: customer.specificFormat.specialData. Without EL, an LLM might guess, hallucinate, or simply fail to retrieve the right information.
At Adobe, we've encountered this challenge across multiple products.
- In AI Assistant powered by Adobe Experience Platform Agent Orchestrator, users query their customer data using natural language — asking about audiences, schemas, datasets, and campaigns. For example, when a user asks "which xdm fields use dob," entity linking maps the abbreviation "dob" to person.birthDate, person.birthYear, and person.birthDayAndMonth — ranked by relevance.
Without EL, around a quarter of all user interactions in AI Assistant — the ones that contain specific entity mentions — would be at risk of returning incomplete or irrelevant answers.
Why is it hard?
While entity linking is considered a solved problem in public knowledge bases like Wikipedia, where entities are stable, well-documented, and globally understood, there are still significant challenges in enterprise environments:
- Customer-specific schemas: Every organization has its own data model. Field names, dataset names, and audience labels vary significantly and are not part of any pre-trained model's vocabulary.
- Evolving taxonomies: Unlike Wikipedia, enterprise data changes frequently. New campaigns, products, and datasets are added continuously.
- Ambiguity at multiple levels: Users may abbreviate, misspell, use synonyms, or refer to concepts that only make sense within their organizational context.
- Data governance constraints: Users should only be able to resolve entities they are authorized to access. Entity linking must respect role-based access controls (RBAC) without leaking information across organizational boundaries.
- Strict latency requirements: A single user query can trigger multiple agents, each of which calls entity linking independently. The service must resolve mentions in milliseconds, not seconds, to keep the overall response time acceptable.
These constraints make a naive lookup approach unworkable and require a more principled, multi-stage architecture.
How we addressed the problems.
Entity linking in our systems is implemented as a three-stage pipeline, with flexibility at each stage so that different downstream applications can choose the approach best suited to their query complexity and latency budget.
Stage 1 — Mention extraction identifies spans in the input text that are likely entity references. This can be done through n-gram enumeration (exhaustive but broad), NER-based span detection (faster, model-guided), or LLM-based prompting (highest precision for complex inputs).
Stage 2 — Candidate retrieval pulls potential matches from an entity store. Three modes are available:
- Lexical search prioritizes surface-form matches and handles typos and formatting variations well.
- Semantic search uses embedding similarity to recover paraphrases and contextual matches when strings don't overlap.
- Hybrid search combines both for the best balance of precision and recall before downstream reranking.
Pre-filtering at this stage also narrows the candidate set based on metadata — including the user's permissions — so governance constraints are enforced before any ranking occurs.
Stage 3 — Reranking scores and orders candidates using heuristics, a contextual in-house model, or LLM prompting, depending on what the query requires.
The underlying entity store uses a partitioned database with a primary–replica architecture. Each entity namespace is stored in a separate partition, enforcing data isolation and RBAC rules. The database is hydrated by daily ingestion jobs that pull from various APIs, transform the data, and generate vector embeddings for semantic search. As new entity types are onboarded, their catalogs are registered through these same ingestion jobs.
The service itself runs as a stateless service on Kubernetes, scaling horizontally to meet demand. Real-time dashboards track latency, error rates, and memory utilization. Alerts notify on-call engineers when service-level agreements are at risk, and hourly sanity tests catch issues proactively. When a dependency fails, the service degrades gracefully — returning a best-effort answer rather than an exception.
Product impact: What users experience.
The most direct measure of the value of entity linking is what users don't have to do: They don't have to remember exact field names, match the system's capitalization, or know how their organization's schema is structured. They can ask naturally, and the system figures out what they mean.
Here are two concrete illustrations of what that looks like in practice.