Documentation

Getting Started

Overview

FastRAG is a production-ready RAG (Retrieval-Augmented Generation) starter kit built with Next.js, LangChain, Pinecone, and OpenAI. It eliminates 40+ hours of boilerplate — vector ingestion pipelines, streaming responses, context window management, and a mobile-ready chat UI — so you can focus on building your actual product.

16+

Next.js

Latest

LangChain

Pinecone

GPT-4o

OpenAI

What you getMulti-file PDF ingestion, URL scraping with Puppeteer, streaming chat with source citations, mobile-responsive chat UI, and full source code you actually understand — no black boxes.

Getting Started

Prerequisites

Make sure you have accounts and API keys ready. All services have free tiers.

Node.js v18+Required

Required to run Next.js locally.

Get key

OPENAI_API_KEYRequired

Used for embeddings and chat completions. Requires at least $5 in pre-paid credits — a new API key alone isn't enough.

Get key

PINECONE_API_KEYRequired

Vector database for storing and querying embeddings. The free Starter plan is sufficient.

Get key

Browserless.io TokenOptional

Headless Chrome for URL scraping. Only needed if you use the web scraping feature.

Get key

Getting Started

Installation

Clone or unzip the project

If you have GitHub repo access (included with all purchases):

bash

git clone fastrag.git
cd fastrag

Or unzip the downloaded ZIP file and open the folder in your terminal — both options are included with your purchase.

Install dependencies

bash

npm install

Seeing peer dependency warnings? Run npm install --legacy-peer-deps — common due to LangChain's rapid release cadence.

Getting Started

Environment Setup

Rename .env.example to .env.local and fill in your keys:

.env.local

# OpenAI — platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...

# Pinecone — app.pinecone.io
PINECONE_API_KEY=pc-sk-...

# Must match the index name you create in Pinecone
PINECONE_INDEX=fast-rag

# Optional: only needed for URL scraping
BROWSERLESS_TOKEN=your-token-here

OPENAI_API_KEY

Required

Powers text-embedding-3-small (ingestion) and GPT-4o (chat).

PINECONE_API_KEY

Required

Used to upsert and query your vector index.

PINECONE_INDEX

Required

Must exactly match the index name in Pinecone — case-sensitive. "fast-rag" ≠ "Fast-RAG".

BROWSERLESS_TOKEN

Optional

Powers headless Chrome for scraping JS-rendered sites. Skip if not using URL ingestion.

Getting Started

Pinecone Setup

CriticalThis is the most common setup mistake. Using wrong dimension settings will crash the app immediately on first ingestion.

Go to app.pinecone.io and sign in

Click "Create Index"

Use these exact settings:

SettingValueNote

Namefast-ragMust match PINECONE_INDEX in .env.local

Dimensions1024⚠ Do NOT use the default 1536

MetricCosineRequired for semantic similarity

CloudAWS us-east-1Recommended for lowest latency

Click Create — wait ~30 seconds for the index to initialise

Why 1024 dimensions?FastRAG uses text-embedding-3-small forced to 1024 dims instead of the default 1536. This cuts Pinecone storage costs by ~33% with negligible quality loss.

Getting Started

Running Locally

bash

npm run dev

Open http://localhost:3000 in your browser.

Quick testUpload a small PDF (<1 MB), wait for the ingestion confirmation toast, then ask a question about its contents. If you get a cited answer — everything is wired up correctly.

How It Works

Architecture

FastRAG is a standard two-phase RAG pipeline. Ingestion happens once per document; retrieval happens on every chat message.

Ingestion (once per document)

Source

PDF / URL

Parse

Extract text

Chunk

1 000 chars

Embed

1 024 dims

Store

Pinecone

Retrieval (every message)

Question

User input

Embed

Same model

Query

Top-4 chunks

Prompt

Inject context

Stream

GPT-4o

Three Next.js API routes handle everything:

pages/api/ingest.jsPDF uploads, recursive chunking, and vector upsert

pages/api/ingest-url.jsPuppeteer scraping, text extraction, vectorization

pages/api/chat.jsSimilarity search, prompt construction, GPT-4o stream

How It Works

PDF Ingestion

Handled by pages/api/ingest.js. Supports multiple files uploaded simultaneously via drag-and-drop.

Form Parsing formidable handles the multipart upload and exposes file paths on the server filesystem.

Loading LangChain's PDFLoader extracts raw text from each file, page by page, preserving order.

Splitting RecursiveCharacterTextSplitter cuts text into 1 000-char chunks with 200-char overlap. The overlap preserves sentence context across chunk boundaries.

Embedding text-embedding-3-small converts each chunk into a 1 024-dimensional vector via the OpenAI Embeddings API.

Storage Vectors are upserted to Pinecone under a 'global' namespace so all documents are searched together in a single query.

How It Works

URL Ingestion

Handled by pages/api/ingest-url.js. Paste any URL to scrape, clean, and vectorize it in seconds.

Headless Browser puppeteer-core connects to Browserless.io — a remote Chromium instance that fully renders JavaScript before scraping. Works perfectly on React, Next.js, and SPA sites.

Extraction Pulls full body text after JS execution completes (waitUntil: 'networkidle2'), then strips navigation, footers, and boilerplate HTML.

Metadata Tags each vector with the source URL so the AI can cite the exact page in responses.

Puppeteer scrapes a single page, not an entire site. It will not follow links or crawl multiple pages automatically. For multi-page ingestion, call the endpoint once per URL.

How It Works

Chat & Retrieval

Handled by pages/api/chat.js. Every user message triggers a full retrieval cycle before GPT is called.

Embed Question The user's message is converted to a 1 024-dim vector using the same model as ingestion — ensuring cosine similarity scores are meaningful.

Pinecone Query Top-4 matching chunks are retrieved via similarity search. The topK value is configurable in chat.js.

Prompt Construction Retrieved chunks are injected into a system prompt that instructs the model to answer only from the provided context and to cite the source URL or filename.

Streaming GPT-4o streams the response token-by-token via LangChainAdapter and Vercel AI SDK. Users see answers appear in real time — no loading spinner needed.

How It Works

Frontend

Lives entirely in pages/index.js. A single-page chat interface with two ingestion modes.

File Upload Mode

Drag-and-drop or click to upload PDFs. Multiple files supported simultaneously. Calls /api/ingest.

URL Mode

Paste any URL to scrape and ingest it. Calls /api/ingest-url. Requires Browserless token.

useChat Hook

ai/react's useChat handles SSE streaming, message state, loading state, and auto-scroll.

Mobile Ready

Fully responsive layout. Tested natively on iOS Safari and Android Chrome.

Deployment

Deploy to Vercel

FastRAG is optimised for Vercel. Deployment takes about 5 minutes from a fresh clone.

Push your code to GitHub

bash

git init && git add .
git commit -m "initial"
git remote add origin https://github.com/you/fastrag.git
git push -u origin main

Import to Vercel

Go to vercel.com/new, import your GitHub repo, and select Next.js as the framework preset. No further configuration needed.

Add environment variables

In Vercel project → Settings → Environment Variables, add all keys from your .env.local:

+OPENAI_API_KEY

+PINECONE_API_KEY

+PINECONE_INDEX

+BROWSERLESS_TOKEN

Click Deploy — live in ~2 minutes

Vercel's free Hobby plan has a 10-second function timeout. Large PDFs or slow Puppeteer jobs may timeout. Upgrade to Pro ($20/mo) for a 60-second limit.

Reference

Troubleshooting

Click any error to expand the cause and fix.

Reference

Overview

Prerequisites

Installation

Environment Setup

Pinecone Setup

Running Locally

Architecture

PDF Ingestion

URL Ingestion

Chat & Retrieval

Frontend

Deploy to Vercel

Troubleshooting

FAQ