Should we build the knowledge graph or use an existing one?

Use existing graphs when domain-specific ones exist with good coverage (medical via UMLS, biology via Gene Ontology, etc.). Build custom for proprietary entities (your customer/product/contract data). Most enterprise applications combine both: public knowledge graphs for general entities + custom graph for proprietary data.

Microsoft's open-source pattern (2024) for automatically building knowledge graphs from document corpora using LLMs to extract entities and relationships, then using the resulting graph for hierarchical retrieval. Particularly effective on global questions ('what are the main themes in this corpus?') that vector RAG handles poorly. Significant compute cost to build the graph; useful for static or slow-changing corpora.

What database should we use for a knowledge graph?

Neo4j is the leading dedicated graph database with strong LLM integrations. ArangoDB and TigerGraph are alternatives. For smaller-scale work, RDF triplestores (Apache Jena, Stardog) work well. Increasingly, hybrid databases (Weaviate, Qdrant) support both vector embeddings and graph relationships, simplifying the stack for hybrid RAG + graph applications.

Start a conversation

AI engineering glossary

What is a Knowledge Graph?

A knowledge graph is a structured representation of entities (people, products, places, concepts) and the relationships between them (typically stored in graph databases as nodes and edges with typed properties) enabling queries that traverse relationships, infer new connections, and provide structured context to AI systems for grounded reasoning.

Last updated 2026-04-28BearPlex AI Engineering Team

Overview

Knowledge graphs predate modern LLMs: Google's Knowledge Graph powers search results since 2012, and structured-data systems like Wikidata, YAGO, and DBpedia have been infrastructure for decades. The recent renewed interest comes from LLM applications: knowledge graphs solve real limitations of vector-only RAG (poor handling of relationships and multi-hop reasoning) and provide structured context that complements unstructured text retrieval. Microsoft's GraphRAG (2024), Neo4j's LLM integrations, and the rise of graph-native vector search have made knowledge-graph + LLM hybrid architectures practical for production. At BearPlex, we use knowledge graphs when relationships and multi-hop reasoning matter: most agent and RAG systems don't need them, but for the cases that do, they're transformative.

Knowledge graph vs vector retrieval

Vector retrieval finds documents similar to a query: great for unstructured text, weak for structured relationships. Knowledge graphs answer relationship-based queries: 'who reports to X?', 'what products use component Y?', 'which patients are on drug A and have condition B?'. These queries are hard or impossible with vector retrieval alone because the answer depends on traversing typed relationships across many entities. Production AI systems increasingly use both: vector retrieval for semantic similarity and knowledge graph for structured relationship queries, with the LLM combining both context sources to answer.

Production knowledge graph patterns

(1) GraphRAG (Microsoft, 2024): automatically build a knowledge graph from a document corpus using LLMs to extract entities and relationships, then use the graph for hierarchical and multi-hop retrieval. Strong on global summarization questions; expensive to build. (2) Hybrid vector + graph: store both vector embeddings and graph relationships in a hybrid database (Neo4j with embeddings, Weaviate with graph extensions); query each independently and fuse results. (3) Domain-specific knowledge graphs: for verticals with rich existing taxonomies (medical via UMLS/SNOMED, finance via FIBO, legal via citation networks), use existing knowledge graphs alongside LLM retrieval. (4) Internal entity graphs: for B2B SaaS, build a graph of customers, products, contracts, support tickets for grounded enterprise AI.

Why knowledge graphs are hard

Building and maintaining a knowledge graph is significant engineering investment. Schema design (what entities and relationships matter) requires deep domain understanding. Entity resolution (deciding 'IBM' and 'International Business Machines' are the same entity) is non-trivial. Keeping the graph in sync with changing source data requires ongoing pipelines. Query patterns differ from SQL or document retrieval; teams need new skills. The ROI is high when the application requires multi-hop reasoning or relationship-based queries, but for simple Q&A over documents, vector retrieval alone is much simpler and often sufficient. We recommend knowledge graphs only when there's a clear case the application actually needs the relationship layer.

Use cases

Multi-hop question answering (questions whose answer requires combining facts from multiple sources)
Enterprise AI grounded in customer/product/contract relationships
Healthcare AI using medical knowledge graphs (UMLS, SNOMED CT) for clinical reasoning
Legal AI traversing citation networks to find related case law
Recommendation engines that consider product relationships, not just similarity

Examples in production

Microsoft Research

GraphRAG (2024): automatically builds knowledge graphs from document corpora and uses graph hierarchies for retrieval; outperforms vector RAG on global summarization questions.

Source

Neo4j

Leading graph database with extensive LLM integrations: production knowledge graphs at major banks, pharma companies, retailers powering grounded AI applications.

Source

Google Knowledge Graph

Powers Google's structured search results since 2012: billions of entities and relationships providing the backbone for entity-aware search.

Source

Knowledge Graph compared to alternatives

Alternative	Choose Knowledge Graph when	Choose alternative when
Vector retrieval (RAG) Find similar documents via embeddings	Use knowledge graph for relationship-based queries and multi-hop reasoning	Use vector retrieval for semantic document search; combine with graph for hybrid
Relational database (SQL) Tables with foreign keys and joins	Use knowledge graph for highly-connected data with many relationship types	Use SQL for transactional data with stable schemas and well-defined queries

Common pitfalls

Building a knowledge graph for an application that only needs document retrieval: over-engineering
Underestimating entity resolution complexity: 'looks easy, takes months'
No ongoing maintenance pipeline: knowledge graph drifts from source-of-truth data
Schema design without domain expert input: graph that doesn't match how users think about the domain
Trying to do everything in the knowledge graph instead of using graph + vector hybrid

Related terms

RAG Semantic Search Embedding AI Agent

Related BearPlex services

RAG & Knowledge Systems Data Pipelines & MLOps

Full AI glossary

FAQ

Questions about Knowledge Graph.

Use a knowledge graph when (1) queries require multi-hop reasoning across entities, (2) the domain has explicit important relationships (citation networks, organizational hierarchies, supply chains), or (3) you need to surface 'related X' that aren't textually similar but are structurally connected. For pure Q&A over documents where the answer is in one passage, vector RAG is much simpler and usually sufficient.

Need help implementing Knowledge Graph?

BearPlex builds production AI systems that use Knowledge Graph for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.

Talk to BearPlex See case studies

What is a Knowledge Graph?

Overview

Knowledge graph vs vector retrieval

Production knowledge graph patterns

Why knowledge graphs are hard

Use cases

Examples in production

Microsoft Research

Neo4j

Google Knowledge Graph

Knowledge Graph compared to alternatives

Common pitfalls

Related terms

Related BearPlex services

Questions about Knowledge Graph.

Related reading

Need help implementing Knowledge Graph?