Overview
FastRAG is a production-ready RAG (Retrieval-Augmented Generation) starter kit built with Next.js, LangChain, Pinecone, and OpenAI. It eliminates 40+ hours of boilerplate — vector ingestion pipelines, streaming responses, context window management, and a mobile-ready chat UI — so you can focus on building your actual product.
Prerequisites
Make sure you have accounts and API keys ready. All services have free tiers.
OPENAI_API_KEYRequiredUsed for embeddings and chat completions. Requires at least $5 in pre-paid credits — a new API key alone isn't enough.
Get keyPINECONE_API_KEYRequiredVector database for storing and querying embeddings. The free Starter plan is sufficient.
Get keyBrowserless.io TokenOptionalHeadless Chrome for URL scraping. Only needed if you use the web scraping feature.
Get keyInstallation
Clone or unzip the project
If you have GitHub repo access (included with all purchases):
git clone fastrag.git
cd fastragInstall dependencies
npm installnpm install --legacy-peer-deps — common due to LangChain's rapid release cadence.Environment Setup
Rename .env.example to .env.local and fill in your keys:
# OpenAI — platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...
# Pinecone — app.pinecone.io
PINECONE_API_KEY=pc-sk-...
# Must match the index name you create in Pinecone
PINECONE_INDEX=fast-rag
# Optional: only needed for URL scraping
BROWSERLESS_TOKEN=your-token-hereOPENAI_API_KEYPowers text-embedding-3-small (ingestion) and GPT-4o (chat).
PINECONE_API_KEYUsed to upsert and query your vector index.
PINECONE_INDEXMust exactly match the index name in Pinecone — case-sensitive. "fast-rag" ≠ "Fast-RAG".
BROWSERLESS_TOKENPowers headless Chrome for scraping JS-rendered sites. Skip if not using URL ingestion.
Pinecone Setup
Go to app.pinecone.io and sign in
Click "Create Index"
Use these exact settings:
Click Create — wait ~30 seconds for the index to initialise
text-embedding-3-small forced to 1024 dims instead of the default 1536. This cuts Pinecone storage costs by ~33% with negligible quality loss.Running Locally
npm run devOpen http://localhost:3000 in your browser.
Architecture
FastRAG is a standard two-phase RAG pipeline. Ingestion happens once per document; retrieval happens on every chat message.
Ingestion (once per document)
Retrieval (every message)
Three Next.js API routes handle everything:
pages/api/ingest.jsPDF uploads, recursive chunking, and vector upsertpages/api/ingest-url.jsPuppeteer scraping, text extraction, vectorizationpages/api/chat.jsSimilarity search, prompt construction, GPT-4o streamPDF Ingestion
Handled by pages/api/ingest.js. Supports multiple files uploaded simultaneously via drag-and-drop.
Form Parsing formidable handles the multipart upload and exposes file paths on the server filesystem.
Loading LangChain's PDFLoader extracts raw text from each file, page by page, preserving order.
Splitting RecursiveCharacterTextSplitter cuts text into 1 000-char chunks with 200-char overlap. The overlap preserves sentence context across chunk boundaries.
Embedding text-embedding-3-small converts each chunk into a 1 024-dimensional vector via the OpenAI Embeddings API.
Storage Vectors are upserted to Pinecone under a 'global' namespace so all documents are searched together in a single query.
URL Ingestion
Handled by pages/api/ingest-url.js. Paste any URL to scrape, clean, and vectorize it in seconds.
Headless Browser puppeteer-core connects to Browserless.io — a remote Chromium instance that fully renders JavaScript before scraping. Works perfectly on React, Next.js, and SPA sites.
Extraction Pulls full body text after JS execution completes (waitUntil: 'networkidle2'), then strips navigation, footers, and boilerplate HTML.
Metadata Tags each vector with the source URL so the AI can cite the exact page in responses.
Chat & Retrieval
Handled by pages/api/chat.js. Every user message triggers a full retrieval cycle before GPT is called.
Embed Question The user's message is converted to a 1 024-dim vector using the same model as ingestion — ensuring cosine similarity scores are meaningful.
Pinecone Query Top-4 matching chunks are retrieved via similarity search. The topK value is configurable in chat.js.
Prompt Construction Retrieved chunks are injected into a system prompt that instructs the model to answer only from the provided context and to cite the source URL or filename.
Streaming GPT-4o streams the response token-by-token via LangChainAdapter and Vercel AI SDK. Users see answers appear in real time — no loading spinner needed.
Frontend
Lives entirely in pages/index.js. A single-page chat interface with two ingestion modes.
Deploy to Vercel
FastRAG is optimised for Vercel. Deployment takes about 5 minutes from a fresh clone.
Push your code to GitHub
git init && git add .
git commit -m "initial"
git remote add origin https://github.com/you/fastrag.git
git push -u origin mainImport to Vercel
Go to vercel.com/new, import your GitHub repo, and select Next.js as the framework preset. No further configuration needed.
Add environment variables
In Vercel project → Settings → Environment Variables, add all keys from your .env.local:
OPENAI_API_KEYPINECONE_API_KEYPINECONE_INDEXBROWSERLESS_TOKENClick Deploy — live in ~2 minutes
Troubleshooting
Click any error to expand the cause and fix.