← Back to Projects
Multimodal RAG + Document Retrieval
Built a document-QA pipeline for PDFs using OCR + embeddings + vector search, with tenant-aware filtering for isolation.
RAGQdrantOCREmbeddingsLLMs
Role
ML / Software Engineer (Project)
Timeline
2025
Stack
—
Input
PDFs → page chunks
Retrieval
Vector search (Qdrant)
Safety
Tenant-aware filtering
What I did
- Parsed PDFs into page-level text and generated embeddings for semantic retrieval.
- Integrated OCR for scanned pages and routed extracted text into the same indexing pipeline.
- Queried Qdrant for top-k pages and added tenant-aware filtering to prevent cross-tenant leakage.
- Used an LLM to synthesize answers grounded in retrieved pages (RAG pattern).