What is a Knowledge Graph?
A knowledge graph is a structured representation of entities (people, products, places, concepts) and the relationships between them (typically stored in graph databases as nodes and edges with typed properties) enabling queries that traverse relationships, infer new connections, and provide structured context to AI systems for grounded reasoning.
Overview
Knowledge graphs predate modern LLMs: Google's Knowledge Graph powers search results since 2012, and structured-data systems like Wikidata, YAGO, and DBpedia have been infrastructure for decades. The recent renewed interest comes from LLM applications: knowledge graphs solve real limitations of vector-only RAG (poor handling of relationships and multi-hop reasoning) and provide structured context that complements unstructured text retrieval. Microsoft's GraphRAG (2024), Neo4j's LLM integrations, and the rise of graph-native vector search have made knowledge-graph + LLM hybrid architectures practical for production. At BearPlex, we use knowledge graphs when relationships and multi-hop reasoning matter: most agent and RAG systems don't need them, but for the cases that do, they're transformative.
Knowledge graph vs vector retrieval
Vector retrieval finds documents similar to a query: great for unstructured text, weak for structured relationships. Knowledge graphs answer relationship-based queries: 'who reports to X?', 'what products use component Y?', 'which patients are on drug A and have condition B?'. These queries are hard or impossible with vector retrieval alone because the answer depends on traversing typed relationships across many entities. Production AI systems increasingly use both: vector retrieval for semantic similarity and knowledge graph for structured relationship queries, with the LLM combining both context sources to answer.
Production knowledge graph patterns
(1) GraphRAG (Microsoft, 2024): automatically build a knowledge graph from a document corpus using LLMs to extract entities and relationships, then use the graph for hierarchical and multi-hop retrieval. Strong on global summarization questions; expensive to build. (2) Hybrid vector + graph: store both vector embeddings and graph relationships in a hybrid database (Neo4j with embeddings, Weaviate with graph extensions); query each independently and fuse results. (3) Domain-specific knowledge graphs: for verticals with rich existing taxonomies (medical via UMLS/SNOMED, finance via FIBO, legal via citation networks), use existing knowledge graphs alongside LLM retrieval. (4) Internal entity graphs: for B2B SaaS, build a graph of customers, products, contracts, support tickets for grounded enterprise AI.
Why knowledge graphs are hard
Building and maintaining a knowledge graph is significant engineering investment. Schema design (what entities and relationships matter) requires deep domain understanding. Entity resolution (deciding 'IBM' and 'International Business Machines' are the same entity) is non-trivial. Keeping the graph in sync with changing source data requires ongoing pipelines. Query patterns differ from SQL or document retrieval; teams need new skills. The ROI is high when the application requires multi-hop reasoning or relationship-based queries, but for simple Q&A over documents, vector retrieval alone is much simpler and often sufficient. We recommend knowledge graphs only when there's a clear case the application actually needs the relationship layer.
Use cases
- Multi-hop question answering (questions whose answer requires combining facts from multiple sources)
- Enterprise AI grounded in customer/product/contract relationships
- Healthcare AI using medical knowledge graphs (UMLS, SNOMED CT) for clinical reasoning
- Legal AI traversing citation networks to find related case law
- Recommendation engines that consider product relationships, not just similarity
Examples in production
Microsoft Research
GraphRAG (2024): automatically builds knowledge graphs from document corpora and uses graph hierarchies for retrieval; outperforms vector RAG on global summarization questions.
SourceNeo4j
Leading graph database with extensive LLM integrations: production knowledge graphs at major banks, pharma companies, retailers powering grounded AI applications.
SourceGoogle Knowledge Graph
Powers Google's structured search results since 2012: billions of entities and relationships providing the backbone for entity-aware search.
SourceKnowledge Graph compared to alternatives
| Alternative | Choose Knowledge Graph when | Choose alternative when |
|---|---|---|
Vector retrieval (RAG) Find similar documents via embeddings | Use knowledge graph for relationship-based queries and multi-hop reasoning | Use vector retrieval for semantic document search; combine with graph for hybrid |
Relational database (SQL) Tables with foreign keys and joins | Use knowledge graph for highly-connected data with many relationship types | Use SQL for transactional data with stable schemas and well-defined queries |
Common pitfalls
- Building a knowledge graph for an application that only needs document retrieval: over-engineering
- Underestimating entity resolution complexity: 'looks easy, takes months'
- No ongoing maintenance pipeline: knowledge graph drifts from source-of-truth data
- Schema design without domain expert input: graph that doesn't match how users think about the domain
- Trying to do everything in the knowledge graph instead of using graph + vector hybrid
Questions about Knowledge Graph.
Use existing graphs when domain-specific ones exist with good coverage (medical via UMLS, biology via Gene Ontology, etc.). Build custom for proprietary entities (your customer/product/contract data). Most enterprise applications combine both: public knowledge graphs for general entities + custom graph for proprietary data.
Microsoft's open-source pattern (2024) for automatically building knowledge graphs from document corpora using LLMs to extract entities and relationships, then using the resulting graph for hierarchical retrieval. Particularly effective on global questions ('what are the main themes in this corpus?') that vector RAG handles poorly. Significant compute cost to build the graph; useful for static or slow-changing corpora.
Neo4j is the leading dedicated graph database with strong LLM integrations. ArangoDB and TigerGraph are alternatives. For smaller-scale work, RDF triplestores (Apache Jena, Stardog) work well. Increasingly, hybrid databases (Weaviate, Qdrant) support both vector embeddings and graph relationships, simplifying the stack for hybrid RAG + graph applications.
Need help implementing Knowledge Graph?
BearPlex builds production AI systems that use Knowledge Graph for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.