Case Study

LLM/RAG Assistant with Tool Calling

Problem

An enterprise client needed a document assistant that could retrieve relevant knowledge from ingested PDFs and websites, answer questions with source grounding, and collect structured information via tool calling. The system had to handle real-time client-facing interactions with reliability and low latency in healthcare workflows.

Approach

Built a RAG pipeline using LlamaIndex for document ingestion and retrieval, Qdrant for vector search, and LangChain for orchestration. Implemented tool calling for structured data collection including appointment booking and information intake. Ingested client documents and PDFs, and scraped client websites with Playwright for knowledge base population.

Key Decisions

Chose Qdrant over FAISS for production-ready vector search with filtering
Used Playwright for web scraping to handle JS-rendered client pages
Implemented structured output schemas for tool calling to ensure reliable data collection

Tech Stack

Python, FastAPI, Qdrant, LangChain, LlamaIndex, OpenAI API, Playwright, tool calling.

Outcome

Production system handling customer-facing healthcare workflows where latency, reliability, and maintainability mattered.

Role

Sole engineer on the RAG and tool-calling components.

CTA

Interested in building something similar? Contact me.