Building a 100% Local GraphRAG with Ollama, Neo4j, Qdrant & LangExtract

Most RAG systems retrieve chunks. You split a document, embed the pieces, and pull back the passages nearest your query. It works until the answer lives in the relationships between facts rather than in any single passage. So I built a GraphRAG pipeline that runs 100% locally — no cloud, no API keys — using Ollama for the models, LangExtract for extraction, Neo4j for the graph, and Qdrant for vector search.

Instead of storing text, it turns raw text into a knowledge graph, retrieves the most relevant entities by vector similarity, then expands outward through the graph to gather connected context. The model answers from structured facts, not fragmented prose. Here is the full architecture and the code that makes it run.

Architecture at a glance

The system has two flows. Ingestion converts text into a graph and a vector index. Query finds entry-point entities by similarity, traverses the graph around them, and hands the resulting subgraph to the LLM.

Two flows over one shared store: ingestion builds the graph + vector index; a query retrieves entities, expands the subgraph, and answers from it.

Why a graph instead of chunks

Chunk retrieval throws away structure. "Lisinopril 10mg daily for hypertension" becomes a passage; the fact that Lisinopril → dosage → 10mg and Lisinopril → condition → hypertension is implicit. GraphRAG makes those edges explicit. At query time the system can do multi-hop traversal — start at one entity, walk to its dosage, frequency, and condition — and assemble context that no single chunk contained. For domains where relationships matter more than passages — healthcare, finance, legal, enterprise knowledge — this is the difference between a plausible answer and a correct one.

Setup and configuration

The constructor wires up every dependency: Neo4j and Qdrant clients, the Ollama client for embeddings, and the model/dimension knobs. Everything sensitive comes from environment variables, and nothing leaves the machine.

def __init__(self, env_path: str = ".env", ollama_model_extract: str = "gemma3:latest",
        ollama_model_answer: str = "gemma3:latest", ollama_embedding_model: str = "embeddinggemma:latest",
        ollama_host: str | None = None, vector_dimension: int = 768):
    load_dotenv(env_path)
    self.qdrant_key = os.getenv("QDRANT_KEY")
    self.qdrant_url = os.getenv("QDRANT_URL")
    self.neo4j_uri = os.getenv("NEO4J_URI")
    self.neo4j_username = os.getenv("NEO4J_USERNAME")
    self.neo4j_password = os.getenv("NEO4J_PASSWORD")

    self.neo4j_driver = GraphDatabase.driver(
        self.neo4j_uri, auth=(self.neo4j_username, self.neo4j_password))
    self.qdrant_client = QdrantClient(url=self.qdrant_url, api_key=self.qdrant_key)
    self.ollama_client = ollama.Client(host=ollama_host) if ollama_host else ollama.Client()
    self.ollama_url = ollama_host or os.environ.get("OLLAMA_HOST", "http://localhost:11434")
    self.ollama_model_extract = ollama_model_extract
    self.ollama_model_answer = ollama_model_answer
    self.ollama_embedding_model = ollama_embedding_model
    self.vector_dimension = vector_dimension

Extraction: text to entities with LangExtract

LangExtract is the piece that makes this practical. Rather than asking a model to emit free-form JSON and praying it parses, you give it a prompt description plus few-shot examples, and it returns typed extractions. The trick I lean on is the medication_group attribute: every detail (dosage, frequency, condition) carries the group of the medication it belongs to, which is what lets me reconstruct relationships later.

def extract_graph_components(self, raw_data: str):
    prompt_description = textwrap.dedent("""
        Extract medications with their details, using attributes to group related information:
        1. Extract entities in the order they appear in the text
        2. Each entity must have a 'medication_group' attribute linking it to its medication
        3. All details about a medication should share the same medication_group value
        """).strip()

    examples = [
        lx.data.ExampleData(
            text="Patient takes Aspirin 100mg daily for heart health and Simvastatin 20mg at bedtime.",
            extractions=[
                lx.data.Extraction(extraction_class="medication", extraction_text="Aspirin",
                    attributes={"medication_group": "Aspirin"}),
                lx.data.Extraction(extraction_class="dosage", extraction_text="100mg",
                    attributes={"medication_group": "Aspirin"}),
                lx.data.Extraction(extraction_class="frequency", extraction_text="daily",
                    attributes={"medication_group": "Aspirin"}),
                lx.data.Extraction(extraction_class="condition", extraction_text="heart health",
                    attributes={"medication_group": "Aspirin"}),
                lx.data.Extraction(extraction_class="medication", extraction_text="Simvastatin",
                    attributes={"medication_group": "Simvastatin"}),
            ],
        )
    ]

    result = lx.extract(
        text_or_documents=raw_data, prompt_description=prompt_description, examples=examples,
        model_id=self.ollama_model_extract, model_url=self.ollama_url,
        resolver_params={"format_handler": lx_ollama.OLLAMA_FORMAT_HANDLER},
        max_char_buffer=4000, show_progress=True,
    )
    return self._convert_extractions_to_graph(result.extractions)

From flat extractions to nodes and edges

LangExtract returns a flat list. This step groups by medication_group, picks the medication as the anchor node, mints a UUID per unique entity, and emits an edge from the anchor to each detail typed by its extraction class. Those UUIDs are the shared key that ties Neo4j and Qdrant together.

def _convert_extractions_to_graph(self, extractions: list):
    groups: dict[str, list] = {}
    for ext in extractions:
        if not ext.attributes or "medication_group" not in ext.attributes:
            continue
        groups.setdefault(ext.attributes["medication_group"], []).append(ext)

    nodes: dict[str, str] = {}
    relationships: list[dict] = []
    for group_name, group_extractions in groups.items():
        anchor_ext = next((e for e in group_extractions if e.extraction_class == "medication"), None)
        anchor_text = anchor_ext.extraction_text if anchor_ext else group_name
        if anchor_text not in nodes:
            nodes[anchor_text] = str(uuid.uuid4())
        for ext in group_extractions:
            if ext is anchor_ext:
                continue
            target_text = ext.extraction_text
            if target_text not in nodes:
                nodes[target_text] = str(uuid.uuid4())
            relationships.append({"source": nodes[anchor_text],
                                  "target": nodes[target_text], "type": ext.extraction_class})
    return nodes, relationships

Persisting the graph in Neo4j

Each entity becomes an :Entity node keyed by its UUID; each relationship becomes a native edge whose label is the semantic type (DOSAGE, FREQUENCY, CONDITION) rather than a generic "RELATED". That makes later Cypher both readable and selective.

def ingest_to_neo4j(self, nodes: dict, relationships: list):
    with self.neo4j_driver.session() as session:
        for name, node_id in nodes.items():
            session.run("CREATE (n:Entity {id: $id, name: $name})", id=node_id, name=name)
        for relationship in relationships:
            rel_type = self._sanitize_relationship_type(relationship["type"])
            session.run(
                "MATCH (a:Entity {id: $source_id}), (b:Entity {id: $target_id}) "
                f"CREATE (a)-[:{rel_type} {{type: $type}}]->(b)",
                source_id=relationship["source"], target_id=relationship["target"],
                type=relationship["type"])
    return nodes

One sharp edge here: Cypher relationship types cannot be parameterized, so they have to be interpolated into the query string. Since the type came from an LLM, that is an injection risk. The sanitizer forces it to a safe UPPER_SNAKE_CASE identifier before it ever touches the query.

@staticmethod
def _sanitize_relationship_type(raw_type: str) -> str:
    safe = "".join(ch if ch.isalnum() else "_" for ch in raw_type.strip())
    safe = safe.upper().strip("_") or "RELATIONSHIP"
    if safe[0].isdigit():
        safe = f"REL_{safe}"
    return safe

Any time LLM output reaches a query string, treat it as hostile. Sanitize to a strict character class before interpolation.

Embedding entities, not documents

This is the design choice that defines the system: I embed entity names, not text chunks. Each vector represents a concept. The payload stored in Qdrant carries the Neo4j node id, so a vector hit maps straight back to a graph node.

def ollama_embeddings(self, text: str) -> list[float]:
    response = self.ollama_client.embeddings(model=self.ollama_embedding_model, prompt=text)
    return response["embedding"]

def ingest_to_qdrant(self, collection_name: str, raw_data: str, node_id_mapping: dict):
    names = list(node_id_mapping.keys())
    embeddings = [self.ollama_embeddings(name) for name in names]
    self.qdrant_client.upsert(
        collection_name=collection_name,
        points=[{"id": str(uuid.uuid4()), "vector": embedding,
                 "payload": {"id": node_id_mapping[name], "name": name}}
                for name, embedding in zip(names, embeddings)])

Retrieval: similarity then traversal

At query time the question is embedded and searched in Qdrant, which returns graph node ids — concept entry points, not passages. Those ids drive a Neo4j traversal that pulls each matched entity plus its one- and two-hop neighborhood, so the context includes facts vector search never directly matched.

def retriever_search(self, collection_name: str, query: str, top_k: int = 5):
    retriever = QdrantNeo4jRetriever(
        driver=self.neo4j_driver, client=self.qdrant_client,
        collection_name=collection_name,
        id_property_external="id", id_property_neo4j="id")
    return retriever.search(query_vector=self.ollama_embeddings(query), top_k=top_k)

def fetch_related_graph(self, entity_ids: list):
    query = """
    MATCH (e:Entity)-[r1]-(n1)-[r2]-(n2)
    WHERE e.id IN $entity_ids
    RETURN e, r1 as r, n1 as related, r2, n2
    UNION
    MATCH (e:Entity)-[r]-(related)
    WHERE e.id IN $entity_ids
    RETURN e, r, related, null as r2, null as n2
    """
    with self.neo4j_driver.session() as session:
        result = session.run(query, entity_ids=entity_ids)
        subgraph = []
        for record in result:
            subgraph.append({"entity": record["e"], "relationship": record["r"],
                             "related_node": record["related"]})
            if record["r2"] and record["n2"]:
                subgraph.append({"entity": record["related"], "relationship": record["r2"],
                                 "related_node": record["n2"]})
    return subgraph

Formatting the subgraph for the LLM

A graph is not a prompt. This step flattens the subgraph into a node list and readable triples like Lisinopril dosage 10mg — explicit statements the model can reason over directly.

def format_graph_context(self, subgraph: list):
    nodes = set()
    edges = []
    for entry in subgraph:
        entity = entry["entity"]
        related = entry["related_node"]
        relationship = entry["relationship"]
        nodes.add(entity["name"])
        nodes.add(related["name"])
        edges.append(f"{entity['name']} {relationship['type']} {related['name']}")
    return {"nodes": list(nodes), "edges": edges}

def graphRAG_run(self, graph_context: dict, user_query: str):
    nodes_str = ", ".join(graph_context["nodes"])
    edges_str = "; ".join(graph_context["edges"])
    prompt = f"""
    You are an intelligent assistant with access to the following knowledge graph:
    Nodes: {nodes_str}
    Edges: {edges_str}
    Using this graph, answer the following question:
    User Query: "{user_query}"
    """
    response = chat(model=self.ollama_model_answer, messages=[
        {"role": "system", "content": "Provide the answer for the following question:"},
        {"role": "user", "content": prompt}])
    return response.message.content

The full pipeline

Ingestion runs once; queries run against the prebuilt graph + index. run_pipeline ties retrieval, traversal, formatting, and generation together.

def run_pipeline(self, raw_data: str, query: str, collection_name: str = "medicationGraphRAGstore"):
    # self.create_and_ingest(raw_data, query, collection_name)  # first run only
    retriever_result = self.retriever_search(collection_name, query)
    entity_ids = [item.content.split("'id': '")[1].split("'")[0]
                  for item in retriever_result.items]
    subgraph = self.fetch_related_graph(entity_ids)
    graph_context = self.format_graph_context(subgraph)
    return self.graphRAG_run(graph_context, query)

Does it actually work?

I fed it a paragraph about a 62-year-old on six medications — Lisinopril, Metformin, Atorvastatin, Aspirin, Levothyroxine, Sertraline — with dosages, frequencies, and conditions tangled through the prose. Three representative queries:

"List all medications related to cardiovascular conditions and their dosages."

Aspirin — 81mg, daily (heart disease prevention)
Atorvastatin — 40mg, at bedtime (high cholesterol)
Lisinopril — 10mg, daily (hypertension)

"What medication is prescribed for hypothyroidism, and at what dose?" → Levothyroxine 75mcg, every morning.

"Which medications does the patient take once daily versus twice daily?" → Once daily: Aspirin, Sertraline, Levothyroxine. Twice daily: Metformin. The model got the split right because the graph encoded each frequency edge explicitly — there was nothing to infer.

What I would tune next

Entity resolution. Names are nodes, so "heart health" and "heart disease prevention" become distinct entities. A normalization pass would merge synonyms.
Traversal depth. Two hops is a sensible default; deeper walks add context but also noise. Cap it per query class.
Hybrid scoring. Combine vector similarity with graph centrality so well-connected entities rank higher.

Conclusion

GraphRAG trades chunk retrieval for relationship retrieval, and running it entirely on Ollama, Neo4j, and Qdrant means the data, the models, and the infrastructure never leave your machine. LangExtract does the heavy lifting of turning prose into a typed graph; the rest is plumbing two databases together around shared ids. The payoff is answers grounded in explicit, traceable facts rather than fragments of text.

If you want the self-correcting, web-scraping take on retrieval, see Self Adaptive Context RAG; for the vector-search internals underneath all of this, see FAISS HNSW for multilingual dedup.