Entity Linking

Entity linking is the process of recognizing when multiple references in content point to the same real-world entity, then resolving them to a single canonical identity in memory. For example, "Alice Johnson," "Alice J.," "ajohnson@company.com," and "@alice" all refer to the same person—entity linking ensures they're treated as one entity in the knowledge graph.

Without entity linking, memory fragments: every variation creates a separate entity, breaking relationship queries and reasoning. With entity linking, agents understand that all mentions refer to the same person, enabling accurate queries like "What tasks does Alice own?" and consistent relationship traversal.

The outcome is unified, queryable memory where entities have canonical identities, relationships are preserved, and reasoning works reliably.

Why it matters

Prevents memory fragmentation: Without linking, "Project Alpha" and "Alpha Project" become separate entities—breaking queries and synthesis.
Enables accurate relationship queries: "Who owns tasks in Alpha?" only works if all Alpha references link to one canonical entity.
Improves search precision: Agents can find all mentions of "Alice" across documents, even when referred to differently.
Supports cross-system integration: Alice's Slack handle, email, and Jira username are unified—memory spans tools seamlessly.
Reduces hallucinations: Agents reason over consistent entities, not scattered mentions—grounding improves accuracy.
Facilitates deduplication: When ingesting new content, entity linking detects existing entities, preventing duplicates.

How it works

Entity linking operates through extraction, matching, and resolution:

Ingestion → Content enters the system (documents, messages, events).
Entity Extraction → NLP models identify mentions: "Alice Johnson," "ajohnson@acme.com," "Project Alpha."
Candidate Generation → For each mention, the system generates candidate entities: "Alice Johnson" might match existing entity "Alice J."
Similarity Scoring → Candidates are scored based on name similarity, attributes (email, role), and context (mentioned in same document).
Resolution Decision → High-confidence matches are linked to existing entities; low-confidence mentions create new entities or flag for review.
Canonical Update → The canonical entity's profile is updated with new attributes or aliases (e.g., add "@alice" as an alias).

This cycle ensures entities remain unified as new mentions are discovered.

Comparison & confusion to avoid

Term	What it is	What it isn't	When to use
Entity Linking	Resolving multiple mentions to canonical identities	Entity extraction—linking happens after extraction	When building knowledge graphs with consistent entities
Entity Extraction	Identifying mentions of entities in text	Resolving those mentions to canonical IDs—extraction comes first	Finding entities in content—not deduplicating them
Deduplication	Removing duplicate records	Entity linking across varied mentions—linking is semantic, not exact	Cleaning up exact duplicates in structured data
Entity Resolution	Another term for entity linking	(Same concept, different terminology)	Same use case as entity linking

Examples & uses

Person entity linking across systems
Slack mentions "@alice," Jira shows "ajohnson," email is "alice.johnson@acme.com," docs say "Alice J." Entity linking resolves all to canonical entity "Alice Johnson (Person)" with aliases stored. Queries for "Alice's tasks" work across all systems.

Project name variations
Documents mention "Project Alpha," "Alpha Initiative," "Alpha," and "Proj. Alpha." Entity linking resolves all to one canonical "Project Alpha" entity. Queries like "What's blocking Alpha?" retrieve all relevant information.

Company consolidation after acquisition
Before acquisition: "Acme Corp" and "BetaCo" are separate entities. After acquisition: "BetaCo is now part of Acme Corp" is processed. Entity linking merges or relates the entities, preserving history and relationships.

Best practices

Use multiple signals for matching: Name similarity + context + attributes (email, role) improves linking accuracy over name alone.
Implement confidence thresholds: High-confidence matches auto-link; medium-confidence flags for human review; low-confidence creates new entities.
Store aliases and variations: When linking "Alice J." to "Alice Johnson," store "Alice J." as an alias—future mentions auto-resolve.
Support manual corrections: Allow users to merge entities or split incorrectly linked ones—agents learn from corrections.
Track provenance: Record where each mention was found—helps debug linking errors and assess confidence.
Handle temporal changes: "Alice worked at Acme 2020-2023, now at BetaCo"—entities have time-bound attributes.

Common pitfalls

Over-linking: Linking "Alice Johnson (engineer)" and "Alice Johnson (CEO)" because of name similarity—context matters.
Under-linking: Treating "Project Alpha" and "Alpha" as separate entities when they're the same—lose relationship queries.
No human-in-the-loop: Automated linking has errors—implement review workflows for ambiguous cases.
Static entity profiles: Entities evolve—Alice changes companies, projects get renamed—support updates and versioning.
Ignoring context: "Apple (fruit)" vs. "Apple (company)"—name alone is insufficient; context disambiguates.

Entity Linking

Entity Linking

Why it matters

How it works

Comparison & confusion to avoid

Examples & uses

Best practices

Common pitfalls

See also

Ready to build with Graphlit?