Case Study
LLM/RAG Assistant with Tool Calling
Problem
An enterprise client needed a document assistant that could retrieve relevant knowledge from ingested PDFs and websites, answer questions with source grounding, and collect structured information via tool calling. The system had to handle real-time client-facing interactions with reliability and low latency in healthcare workflows.
Approach
Built a RAG pipeline using LlamaIndex for document ingestion and retrieval, Qdrant for vector search, and LangChain for orchestration. Implemented tool calling for structured data collection including appointment booking and information intake. Ingested client documents and PDFs, and scraped client websites with Playwright for knowledge base population.
Key Decisions
- Chose Qdrant over FAISS for production-ready vector search with filtering
- Used Playwright for web scraping to handle JS-rendered client pages
- Implemented structured output schemas for tool calling to ensure reliable data collection
Tech Stack
Python, FastAPI, Qdrant, LangChain, LlamaIndex, OpenAI API, Playwright, tool calling.
Outcome
Production system handling customer-facing healthcare workflows where latency, reliability, and maintainability mattered.
Role
Sole engineer on the RAG and tool-calling components.
CTA
Interested in building something similar? Contact me.