Skip to main content
Open In ColabOpen on GitHub

CogneeRetriever

This will help you getting started with the Cognee retriever. For detailed documentation of all CogneeRetriever features and configurations head to the API reference.

Integration details

Bring-your-own data (i.e., index and search a custom corpus of documents):

RetrieverSelf-hostCloud offeringPackage
CogneeRetrieverlangchain-cognee

Setup

For cognee default setup, only thing you need is your OpenAI API key.

If you want to get automated tracing from individual queries, you can also set your LangSmith API key by uncommenting below:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

This retriever lives in the langchain-cognee package:

%pip install -qU langchain-cognee
import nest_asyncio

nest_asyncio.apply()

Instantiation

Now we can instantiate our retriever:

from langchain_cognee import CogneeRetriever

retriever = CogneeRetriever(
llm_api_key="sk-", # OpenAI API Key
dataset_name="my_dataset",
k=3,
)

Usage

Add some documents, process them, and then run queries. Cognee retrieves relevant knowledge to your queries and generates final answers.

# Example of adding and processing documents
from langchain_core.documents import Document

docs = [
Document(page_content="Elon Musk is the CEO of SpaceX."),
Document(page_content="SpaceX focuses on rockets and space travel."),
]

retriever.add_documents(docs)
retriever.process_data()

# Now let's query the retriever
query = "Tell me about Elon Musk"
results = retriever.invoke(query)

for idx, doc in enumerate(results, start=1):
print(f"Doc {idx}: {doc.page_content}")
API Reference:Document

Use within a chain

Like other retrievers, CogneeRetriever can be incorporated into LLM applications via chains.

We will need a LLM or chat model:

pip install -qU "langchain[groq]"
import getpass
import os

if not os.environ.get("GROQ_API_KEY"):
os.environ["GROQ_API_KEY"] = getpass.getpass("Enter API key for Groq: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("llama3-8b-8192", model_provider="groq")
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
API Reference:ChatOpenAI
from langchain_cognee import CogneeRetriever
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Instantiate the retriever with your Cognee config
retriever = CogneeRetriever(llm_api_key="sk-", dataset_name="my_dataset", k=3)

# Optionally, prune/reset the dataset for a clean slate
retriever.prune()

# Add some documents
docs = [
Document(page_content="Elon Musk is the CEO of SpaceX."),
Document(page_content="SpaceX focuses on space travel."),
]
retriever.add_documents(docs)
retriever.process_data()


prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)


def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)


chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
answer = chain.invoke("What companies do Elon Musk own?")

print("\nFinal chain answer:\n", answer)

API reference

TODO: add link to API reference.


Was this page helpful?