Our Projects

Production AI systems we've designed and built, from architecture to deployment.

Featured Project

BookStack RAG System

A production-grade Retrieval-Augmented Generation system that transforms a BookStack knowledge base into an intelligent, searchable system with AI-powered answers and source citations.

Built as a fully external service on Google Cloud Platform, the system combines hybrid vector search with semantic reranking to deliver accurate, context-aware responses. It integrates with Claude Desktop and Claude Code via an MCP server, and provides agentic search through a Slack bot.

The entire system was designed and deployed using Terraform for infrastructure as code, with CI/CD via GitHub Actions and comprehensive test coverage.

Project Stats

11
Weeks to Production
194
Commits
871+
Tests
90%+
Code Coverage

Architecture Highlights

Hybrid Vector Search

Parallel dense and sparse embedding retrieval combined with Reciprocal Rank Fusion for comprehensive relevance.

  • Dense embeddings via Vertex AI text-embedding-005
  • Sparse embeddings via FastEmbed SPLADE
  • RRF fusion combining both retrieval paths

Semantic Reranking & RAG

AI-powered answer generation with hierarchical source citations from reranked search results.

  • Vertex AI Ranking API for semantic reranking
  • Gemini 2.5 Flash for answer generation
  • Hierarchical citations (Shelf > Book > Chapter > Page)

MCP Server

Model Context Protocol server exposing the knowledge base as a tool for AI agents.

  • Streamable HTTP transport
  • Claude Desktop and Claude Code integration
  • Direct search pipeline access for AI agents

Integration Points

Agentic Search Slack Bot

Conversational AI agent in Slack that searches the knowledge base, synthesizes answers, and presents results with interactive Block Kit UI.

  • Claude Haiku via Vertex AI Model Garden
  • Slack Socket Mode with Block Kit UI
  • Multi-turn conversational context

Webhook Ingestion Pipeline

Real-time content synchronization from BookStack CMS through a rate-controlled, idempotent processing pipeline.

  • Cloud Tasks for rate-limited, reliable processing
  • Markdown chunking with hierarchical metadata
  • Idempotency checking to prevent duplicates

Technology Stack

AI / ML

Vertex AI Gemini 2.5 Flash FastEmbed SPLADE Vertex AI Ranking API Claude Haiku

Infrastructure

GCP Cloud Run Cloud Tasks Firestore Terraform

Development

Python FastAPI Pydantic AI pytest GitHub Actions

Integration

MCP Slack Socket Mode BookStack API Block Kit UI

Want to build something similar?

Let's discuss how we can build production AI systems for your organization -- from RAG pipelines to agentic search products.