Your Digital Gateway to Pakistan's Constitution and Legislative Knowledge - an AI-powered constitutional guide making legal knowledge accessible to all citizens.
Numainda is an interactive platform that helps citizens learn about Pakistan's constitution and rights through natural conversations in simple language. Built with Next.js 13 and powered by RAG (Retrieval-Augmented Generation), it provides accurate information about Pakistan's constitution, election laws, and parliamentary proceedings.
-
AI-Powered Constitutional Guide
- Natural language conversations about legal topics
- 24/7 constitutional guidance powered by GPT-4o-mini
- RAG-based responses with source citations
- Complex legal concepts explained in everyday terms
- Bilingual support (English and Urdu)
-
Knowledge Areas
- Constitution of Pakistan
- Election Laws
- Parliamentary Bulletins and Daily Proceedings
- Legislative Bills and Acts
-
Advanced Capabilities
- Vector similarity search using pgvector
- Real-time streaming responses
- Conversation thread persistence
- OAuth authentication via Pehchan (Pakistan's national digital identity)
- Document upload and processing with AI summarization
- Next.js 13 (App Router)
- TypeScript
- Tailwind CSS
- Shadcn UI + Radix UI Components
- Vercel AI SDK
- next-themes (Dark mode)
- PostgreSQL with pgvector extension
- Drizzle ORM
- OpenAI API (embeddings & chat)
- LangChain Community (PDF processing)
- AWS S3 (Document storage)
- Upstash QStash (Background job processing)
- Vercel (Hosting & Analytics)
- Pehchan OAuth (Authentication)
- Node.js 18+ and npm
- PostgreSQL 14+ with pgvector extension
- OpenAI API key
- AWS account (for S3)
- Pehchan OAuth credentials
- Upstash QStash account
# Clone the repository
git clone https://github.com/codeforpakistan/numainda-next.git
cd numainda-next
# Install dependencies
npm installCreate a .env.local file in the root directory:
# Database
DATABASE_URL="postgresql://user:password@localhost:5432/numainda"
# OpenAI
OPENAI_API_KEY="sk-..."
# AWS S3
AWS_REGION="us-east-1"
AWS_ACCESS_KEY_ID="..."
AWS_SECRET_ACCESS_KEY="..."
AWS_S3_BUCKET_NAME="numainda-documents"
# QStash (Background job processing)
QSTASH_TOKEN="..."
QSTASH_CURRENT_SIGNING_KEY="..."
QSTASH_NEXT_SIGNING_KEY="..."
# Pehchan OAuth
NEXT_PUBLIC_PEHCHAN_URL="https://pehchan.nayatel.com"
NEXT_PUBLIC_CLIENT_ID="..."
NEXT_PUBLIC_APP_URL="http://localhost:3000" # Change to production URL when deploying# Generate migrations from schema
npm run db:generate
# Run migrations
npm run db:migrate
# Or for rapid development, push schema directly
npm run db:push
# Open Drizzle Studio to inspect database
npm run db:studioImportant: Ensure the pgvector extension is installed and the HNSW index is created:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops);# Start development server (localhost:3000)
npm run dev
# Run type checking
npm run typecheck
# Run linter
npm run lint
# Fix linting issues
npm run lint:fix
# Format code with Prettier
npm run format:write
# Check formatting
npm run format:check# Run tests
npm test
# Run tests in watch mode
npm run test:watch# Development
npm run dev # Start development server
npm run build # Build production bundle
npm run start # Start production server
npm run preview # Build and start production
# Code Quality
npm run lint # Run ESLint
npm run lint:fix # Fix linting issues
npm run typecheck # TypeScript check
npm run format:write # Format with Prettier
npm run format:check # Check formatting
# Database (Drizzle ORM)
npm run db:generate # Generate migrations
npm run db:migrate # Run migrations
npm run db:push # Push schema to database
npm run db:pull # Pull schema from database
npm run db:studio # Open Drizzle Studio GUI
npm run db:check # Check migration consistency
# Testing
npm test # Run Jest tests
npm run test:watch # Run tests in watch mode
# Document Ingestion
npm run ingest:batch # Process all PDFs in docs/ folder
npm run ingest <pdf> -- --title "Title" --type bill # Process single PDFProcess all PDFs in a directory at once:
# Process all bills in docs/ folder
npm run ingest:batch
# Process all bills and skip existing
npm run ingest:batch -- --skip-existing
# Process parliamentary bulletins
npm run ingest:batch ./bulletins -- --type parliamentary_bulletin
# Preview what would be processed
npm run ingest:batch -- --dry-runSupported document types: bill, parliamentary_bulletin, constitution, election_law
The batch script automatically:
- Scans directory for all PDFs
- Extracts titles from filenames
- Processes each file sequentially
- Skips already-processed files (with
--skip-existing) - Provides progress summary
Process one PDF at a time:
# Ingest a legislative bill
npm run ingest ./bill.pdf -- \
--title "Finance Bill 2024" \
--type bill \
--status passed \
--bill-number "Bill No. 45" \
--passage-date 2024-12-15
# Ingest parliamentary bulletin
npm run ingest ./bulletin.pdf -- \
--title "Parliamentary Bulletin - 15 Dec 2024" \
--type parliamentary_bulletin \
--date 2024-12-15What the scripts do:
- Extract text from PDF using LangChain
- Chunk text (1500 chars, 300 overlap)
- Generate embeddings with OpenAI (text-embedding-ada-002)
- Store in database with metadata
- Create AI summaries (GPT-4o for bills, detailed summaries for bulletins)
- Make content searchable via RAG in chat
See scripts/README.md for complete CLI documentation.
- Navigate to
/admin/upload - Upload PDF through the web interface
- System automatically:
- Uploads to S3
- Queues async processing job via QStash
- Extracts text and metadata
- Generates embeddings
- Creates AI summaries
- Monitor progress in admin dashboard
When modifying the database schema:
- Update schema files in
lib/db/schema/ - Generate migration:
npm run db:generate - Review generated SQL in
lib/db/migrations/ - Apply migration:
npm run db:migrate - For rapid iteration (skips migrations):
npm run db:push
To improve retrieval quality:
Adjust chunk size/overlap (lib/actions/documents.ts):
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1500, // Modify this
chunkOverlap: 300 // Modify this
});Tune similarity threshold (lib/ai/embedding.ts):
const relevantContent = await findRelevantContent(query, {
threshold: 0.75, // Modify threshold (0-1)
limit: 6 // Number of results
});Update system prompt (app/api/chat/route.tsx):
- Modify the system message to change AI behavior
- Add constraints or formatting requirements
- Adjust citation style
Schema: lib/db/schema/
Key tables:
documents- Core document storage (constitution, laws, bulletins)embeddings- Vector embeddings (1536-dim) with HNSW indexbills- Legislative bills/acts with AI summariesparliamentary-proceedings- Daily proceedings with summarieschat-threads- User conversation history (JSONB messages)document-uploads- Upload tracking with async processing status
Flow (see app/api/chat/route.tsx):
- Extract last user message
findRelevantContent()performs cosine similarity search (threshold > 0.75, top 6 results)- Format context with document titles, types, and sections
- Stream response with GPT-4o-mini
- Cite sources and admit when information is unavailable
Document Processing (lib/actions/documents.ts):
- LangChain PDFLoader extracts text with page metadata
- RecursiveCharacterTextSplitter: chunkSize=1500, chunkOverlap=300
- Section detection and timestamp extraction
- Batch embedding: 5 chunks at a time, 1s delay (rate limiting)
- Type-specific AI summarization
Login Flow (components/pehchan-button.tsx):
- Construct OAuth URL with client_id, redirect_uri, scope
- User authenticates with Pehchan
- Callback handler (
app/auth/callback/page.tsx) receives tokens - Fetch user info, extract pehchan_id (CNIC)
- Store in localStorage, redirect to /chat
Session Management:
- Client-side: localStorage stores tokens, user_info, pehchan_id
- Server-side: pehchan_id used for thread ownership verification
Chat APIs (app/api/chat/):
POST /api/chat- Main chat with RAG streamingGET /api/chat/threads- List user's threadsPOST /api/chat/threads- Create new threadGET /api/chat/threads/[id]- Get thread (auth check)PATCH /api/chat/threads/[id]- Update messages/titleDELETE /api/chat/threads/[id]- Delete thread
Admin APIs (app/api/admin/):
POST /api/admin/uploads- Upload to S3, queue processingPATCH /api/admin/uploads- Update upload statusPOST /api/admin/uploads/process- QStash webhook for processing
Other APIs:
GET /api/bills- Fetch all billsPOST /api/upload- Simple S3 upload
Upload flow (app/api/admin/uploads/process/route.ts):
- Admin uploads PDF via
/api/admin/uploads - File uploaded to S3, record created in document-uploads
- QStash job queued for processing
- Worker fetches file, parses PDF, chunks text, generates embeddings
- Creates document and bill/proceeding records with AI summaries
- Updates upload status (completed/failed)
# Install Vercel CLI
npm i -g vercel
# Deploy
vercel
# Deploy to production
vercel --prodMake sure to:
- Set all environment variables in Vercel dashboard
- Connect PostgreSQL database (e.g., Supabase, Neon, RDS)
- Update
NEXT_PUBLIC_APP_URLto production URL - Configure QStash webhooks with production URLs
# Build the application
npm run build
# Start production server
npm run startRequirements:
- Node.js 18+ runtime
- PostgreSQL 14+ with pgvector
- All environment variables configured
- Reverse proxy (nginx/Apache) recommended
Monitor API Usage:
- OpenAI API usage (embeddings + chat completions)
- Check rate limits and adjust batching if needed
Database Maintenance:
# Vacuum and analyze tables
VACUUM ANALYZE embeddings;
VACUUM ANALYZE documents;
# Reindex for performance
REINDEX INDEX embeddings_embedding_idx;Document Updates:
- Use CLI script or admin interface to add new documents
- Monitor upload status in admin dashboard
- Check QStash logs for processing failures
Error Monitoring:
- Check Vercel logs for API errors
- Monitor QStash webhook failures
- Review OpenAI API error rates
Embedding Rate Limits:
If hitting OpenAI rate limits, adjust batching in lib/actions/documents.ts:
// Current: 5 chunks per batch, 1s delay
const batchSize = 5;
await new Promise(resolve => setTimeout(resolve, 1000));Vector Search Performance: Ensure HNSW index exists:
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops);Auth Token Expiry: Pehchan tokens expire. Users need to re-login. Consider implementing token refresh if needed.
Upload Processing Failures:
- Check QStash dashboard for failed jobs
- Verify S3 bucket permissions
- Ensure OpenAI API key is valid
- Check document format (only PDFs supported)
numainda-next/
├── app/ # Next.js 13 App Router
│ ├── api/ # API routes
│ │ ├── chat/ # Chat & thread management
│ │ ├── admin/ # Admin APIs (uploads, processing)
│ │ └── bills/ # Bills API
│ ├── chat/ # Chat interface
│ ├── bills/ # Bills listing & details
│ ├── proceedings/ # Parliamentary proceedings
│ ├── constitution/ # Constitution viewer
│ ├── admin/ # Admin dashboard
│ └── auth/ # OAuth callback
├── components/ # React components
│ ├── ui/ # Shadcn UI components
│ └── *.tsx # Feature components
├── lib/ # Core logic
│ ├── ai/ # AI & embedding utilities
│ ├── db/ # Database (Drizzle ORM)
│ │ ├── schema/ # Database schema
│ │ └── migrations/ # SQL migrations
│ ├── actions/ # Server actions
│ └── utils.ts # Utilities
├── scripts/ # CLI scripts
│ ├── ingest-pdf.ts # Single PDF ingestion
│ ├── batch-ingest.ts # Batch PDF processing
│ └── README.md # CLI documentation
├── public/ # Static assets
└── config/ # Configuration files
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Run tests:
npm test - Run linter:
npm run lint:fix - Run type check:
npm run typecheck - Commit your changes:
git commit -m 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
- All code must pass TypeScript type checking
- ESLint rules must be followed
- Prettier formatting must be applied
- Tests must pass
- Maintain existing architecture patterns
-
The Original Vision: Started as a parliamentary monitoring system tracking attendance, voting patterns, and legislative performance metrics.
-
The Pivot: User research revealed a deeper need for accessible constitutional knowledge, leading to transformation into an AI-powered constitutional guide.
-
Today's Numainda: Now serves as an interactive platform where citizens can learn about their constitution and rights through simple conversations, powered by state-of-the-art AI and RAG technology.
Featured in "Say Hello to My New Friend" - An article about how Numainda is transforming constitutional literacy in Pakistan through AI and human-centered design.
[Add license information here]
For questions or issues:
- Open an issue on GitHub
- Contact Code for Pakistan team
- Check documentation in
CLAUDE.mdfor detailed technical guidance
- Pehchan: Pakistan's national digital identity (OAuth provider)
- OpenAI: Embeddings and chat completions
- AWS S3: Document storage
- Upstash QStash: Background job queue
- Vercel: Hosting and analytics