This project is a FastAPI chatbot built with LangGraph, LangChain, and SQLite-backed persistent memory. It supports provider-based LLM configuration and is set up to use Groq by default. Each request reloads the stored conversation for a session_id, builds a bounded prompt context (sliding window or summary + window), invokes the model, persists the new turn, and returns the updated history.
- Python 3.12
- A Groq API key for the default setup
python3 -m venv .venv
. .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
cp .env.example .envUpdate .env and set a valid GROQ_API_KEY.
The application validates required configuration during startup. If the API key for the selected provider is missing, startup exits with a clear error message instead of waiting until the first chat request.
. .venv/bin/activate
uvicorn main:app --reloadYou can also run:
. .venv/bin/activate
python main.pyHealth check:
curl http://127.0.0.1:8000/healthExpected response:
{"status":"ok"}. .venv/bin/activate
pytestMEMORY_STRATEGYcontrols context assembly (sliding_windoworsummary_window).MEMORY_WINDOW_SIZEcontrols how many recent message pairs are kept verbatim.MAX_CONTEXT_TOKENSsets a rough context cap usingchars // 4token estimation.- The system prompt is always placed at index
0in the model input. - For
summary_window, rolling summary state is persisted in SQLite (chat_summaries) so it survives restarts.
curl -X POST http://127.0.0.1:8000/chat \
-H "Content-Type: application/json" \
-d '{
"session_id": "user-123",
"message": "Hi, my name is Hasib."
}'Second turn with the same session:
curl -X POST http://127.0.0.1:8000/chat \
-H "Content-Type: application/json" \
-d '{
"session_id": "user-123",
"message": "What is my name?"
}'Because the same session_id is reused, the application loads the earlier conversation from SQLite before calling the model again.
session_idmust be a non-empty string and is limited to 255 characters.messagemust be non-empty and is limited to 8000 characters.- Invalid requests return a consistent error shape:
{
"error": {
"code": "validation_error",
"message": "session_id: Value error, must not be blank"
}
}app/
__init__.py
config.py
context_window.py
graph.py
llm.py
main.py
memory.py
schemas.py
state.py
tests/
test_api.py
test_graph.py
test_memory.py
main.py
README.md
.env.example
requirements.txt
- The client sends
session_idandmessagetoPOST /chat. - FastAPI receives the request in
app/main.py. - The LangGraph
StateGraphinvokesprocess_message. process_messageloads persistent history and summary state from SQLite.- A bounded model context is assembled using the configured memory strategy and token cap.
- The system prompt is always placed at position
0, then recent/summary context and the new user message are appended. - The configured LLM provider receives this bounded context.
- The assistant reply is appended and written back to SQLite.
- The API returns the assistant reply and the updated history.
- Chat history is stored in SQLite at
data/chat_memory.dbby default. - Each stored message contains
session_id,role,content, and a timestamp. - Messages are loaded in insertion order for each session.
- System, human, and AI messages are persisted so later turns can reuse context.
- Summary mode persists rolling summaries in
chat_summarieswithsummarized_upto_message_id.
