Skip to content

davearlin/agent-test-suite

Repository files navigation

Dialogflow Test Suite

A platform for testing Dialogflow CX agents with AI-powered evaluation, modern UI, and detailed analytics.

πŸ“¦ Repository

  • GitHub: dialogflow-test-suite
  • Clone: git clone https://github.com/your-org/dialogflow-test-suite.git

🌟 Key Features

Core Functionality

  • βœ… Dataset Management: Create, edit, and organize test datasets with direct route access
  • βœ… Advanced Question Management: Add, edit, and bulk import questions with dedicated full-screen interface
  • βœ… Dynamic Parameter Evaluation: Revolutionary AI evaluation system with fully configurable parameters (Similarity Score, Empathy Level, No-Match Detection, and custom parameters)
  • βœ… Legacy-Free Evaluation: New test runs use ONLY dynamic parameter-based scoring - no more hardcoded similarity/empathy fields
  • βœ… Enhanced CSV Exports: Comprehensive parameter breakdown exports with unlimited parameters including scores, weights, and reasoning
  • βœ… Intelligent HTML Processing: Automatic detection and optional removal of HTML tags from CSV imports with user-controlled settings
  • βœ… Dynamic Metadata Editing: Revolutionary key-value pair editor for question metadata (no more raw JSON!)
  • βœ… Advanced Search & Filtering: Real-time search across questions and test results with live filtering
  • βœ… Table Management: Complete sorting, pagination, and filtering for large datasets
  • βœ… Dialogflow Testing: Execute tests against your Dialogflow agents with user-specific access
  • βœ… LLM Judge Integration: AI-powered response evaluation using Google Gemini 2.0 Flash with weighted parameter scoring
  • βœ… Computed Analytics: Real-time score computation from parameter weights - no stored legacy scores, full backward compatibility
  • βœ… Project Selection: Dynamic Google Cloud project selection based on user permissions
  • βœ… Quick Test: Instantly test prompts against Dialogflow agents with flow/page selection
  • βœ… Enhanced Bulk Import: Optimized CSV upload workflow with proper column mapping, HTML detection, and file handling
  • βœ… Test Reporting: View detailed results and analytics with color-coded scoring and parameter visualization
  • βœ… Session Parameters Management: Centralized management of quick-add session parameters with full CRUD operations
  • βœ… Business Dashboard: Comprehensive analytics dashboard with performance metrics, trends, and insights for stakeholders

Search & Data Management

  • βœ… Questions Search: Full-text search across question text, expected answers, tags, and priority
  • βœ… Test Results Search: Comprehensive search across questions, answers, reasoning, and error messages
  • βœ… Live Filtering: Real-time search results with instant feedback and smart pagination
  • βœ… Advanced Sorting: Click-to-sort functionality for all data columns with visual indicators
  • βœ… Configurable Pagination: 10, 25, 50, 100 results per page with proper result counting
  • βœ… Empty State Handling: Contextual messages for no results vs no search matches
  • βœ… Performance Optimization: Memoized filtering and sorting for smooth interactions

Modern UI & Navigation

  • βœ… Arrow-Back Navigation: Clean, intuitive ← back buttons replacing complex breadcrumbs
  • βœ… Dark Theme Design: Professional #121212 dark theme with blue (#0066CC) accents
  • βœ… Vertical Space Optimization: Maximized content area with consolidated navigation
  • βœ… Consolidated Configuration Accordion: All test run configuration details (test config, timing, message sequence, session parameters) in a single collapsible section
  • βœ… Two-Column Responsive Layout: Efficient use of horizontal space with side-by-side configuration display that adapts to screen size
  • βœ… Horizontal Message Display: Pre/post-prompt messages shown as compact chips with wrapping instead of vertical lists
  • βœ… Full-Screen Editing: Dedicated pages for complex forms instead of cramped modals
  • βœ… Responsive Layout: Consistent spacing, padding, and mobile-friendly design
  • βœ… Smart File Handling: Proper file input reset and state management for re-uploads
  • βœ… Real-time Updates: Auto-refresh functionality for test run monitoring with live status and results
  • βœ… Enhanced Tables: Full sorting, pagination, and data display with Material-UI components
  • βœ… Intelligent Auto-Refresh: Background polling for running test runs with selective row updates
  • βœ… Agent URL Navigation: Corrected Google Cloud Console links with proper location routing

Security & Authentication

  • βœ… User Authentication: Google OAuth with individual IAM permission respect
  • βœ… Security Model: Each user accesses only agents they have permissions for
  • βœ… User Attribution: Full user tracking with creator information displayed across all test runs and dashboard activity
  • βœ… Creator Visibility: "Created By" column in test runs showing full name and email of test creator
  • βœ… Dashboard User Context: Recent activity feed shows user attribution for all test activities
  • βœ… Multi-User Support: Proper user relationship management with real-time user information display

User Preferences & Session Parameters

  • βœ… Comprehensive Preferences: Both Quick Test and Create Test Run settings automatically saved and restored
  • βœ… Dialogflow Configuration Memory: Project, agent, flow, page, and playbook selections preserved across sessions
  • βœ… Session Parameter Persistence: Custom session parameters remembered for each screen independently
  • βœ… Session Parameters Management: Centralized management interface for creating, editing, and organizing common session parameters
  • βœ… Quick Add Functionality: Pre-configured parameter chips for instant addition to test configurations (no duplicates allowed)
  • βœ… Quick Test Preferences: Project, agent, flow, page, playbook, model, and session parameters saved automatically
  • βœ… Test Run Preferences: Separate preference system for Create Test Run screen with all Dialogflow Configuration fields
  • βœ… API-Based Storage: RESTful endpoints for preference management with proper schema validation
  • βœ… Duplicate Prevention: Smart validation prevents duplicate session parameter keys in both frontend and backend
  • βœ… Generic Configuration: Flexible key-value session parameters for specialized agent behavior
  • βœ… Type Safety: Full TypeScript integration with proper schema alignment between frontend and backend

πŸš€ Quick Start

πŸ“– For New Developers

First time setting up? See the comprehensive docs/setup/developer-setup.md guide for detailed step-by-step instructions.

TL;DR Minimal Setup:

git clone https://github.com/your-org/dialogflow-test-suite.git
cd dialogflow-test-suite

# Configure Google OAuth (required for login)
# Create .env in project root (NOT in backend/ or frontend/)
cp .env.example .env
# Edit .env and add GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET
# See docs/setup/oauth-setup.md for getting OAuth credentials

docker-compose up -d
# Wait 2-3 minutes for first build
# Access: http://localhost:3000
# Login: Use your Google account (all users get admin role by default)

Prerequisites

  • Docker Desktop installed and running
  • PowerShell or Command Prompt
  • Google Cloud Platform account with Dialogflow CX access (optional - for testing real agents)

Start the Application

cd "C:\Projects\your-workspace\Dialogflow Agent Tester"
docker-compose up -d

The application will automatically:

  • βœ… Build all containers (backend, frontend, database, Redis for local caching)
  • βœ… Initialize the database with all required tables and columns
  • βœ… Run unified migration system to ensure schema consistency (column additions, complex operations, data backfills)
  • βœ… Enable hot reload for instant code updates without rebuilds (see Development Workflow below)

⚠️ IMPORTANT: This application uses Google OAuth SSO only - there is no default admin account. You must configure Google OAuth to login (see Environment Variables Setup below).

Access the Application

Authentication & Setup

  • Production: Google OAuth with individual user credentials managed via GitHub Actions
  • Infrastructure: Fully managed via Terraform with automated deployments
  • Project Access: Users see only Google Cloud projects they have access to
  • Agent Access: Users see only Dialogflow agents they have IAM permissions for
  • OAuth Configuration: Automatically configured via GitHub Actions environment variables
  • Setup Guide: See docs/setup/ for comprehensive setup documentation

Direct Route Access

Environment Variables Setup (REQUIRED for Login)

⚠️ Google OAuth is REQUIRED - The application uses Google SSO authentication only. You must configure OAuth to login.

Quick Setup:

# From project root (dialogflow-test-suite/)
cp .env.example .env

# Edit .env and add your values (see below)

Minimal Configuration (Required for Login):

# Edit: /.env (project root - NOT /backend/.env or /frontend/.env.local)
# Required for Google OAuth login - YOU MUST HAVE THESE:
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-client-secret
GOOGLE_REDIRECT_URI=http://localhost:8000/api/v1/auth/google/callback

# Optional - for Dialogflow agent testing:
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_API_KEY=your-google-api-key-here

Important File Locations:

  • βœ… /.env (project root) - Used by docker-compose - THIS IS THE ONE YOU NEED
  • βœ… /.env.example (project root) - Template with all available variables
  • ❌ /backend/.env - Only for direct Python development (not needed for Docker)
  • βœ… /frontend/.env.local - Already configured for local Docker (no changes needed)

Detailed Setup Guides:

  • πŸ“– docs/setup/oauth-setup.md - START HERE - How to get OAuth credentials (REQUIRED)
  • πŸ“– docs/setup/developer-setup.md - Complete first-time setup walkthrough
  • πŸ“– docs/setup/google-auth.md - Google Cloud project setup (optional - for Dialogflow testing)
  • πŸ“– docs/oauth-environment-variables.md - Complete environment variable reference
  • πŸ“– frontend/ENVIRONMENT_CONFIG.md - Frontend-specific environment configuration

Authentication Flow:

  • βœ… All users login via Google OAuth SSO
  • βœ… First-time users are automatically created
  • βœ… Other domains get viewer role
  • ❌ No default accounts exist - OAuth setup is mandatory

Stop the Application

docker-compose down

⚠️ Important Development Note

After making code changes, rebuild containers (don't just restart):

# ONLY NEEDED for dependency changes (requirements.txt, package.json)
# Code changes now use hot reload - no rebuild required!

# Backend dependency changes
docker-compose build backend && docker-compose up -d backend

# Frontend dependency changes
docker-compose build frontend && docker-compose up -d frontend

πŸ”₯ Development Workflow with Hot Reload

Hot reload is NOW ENABLED! Code changes appear instantly without Docker rebuilds.

How It Works

  • Backend (Python): Uvicorn watches .py files β†’ auto-reloads in 1-3 seconds
  • Frontend (React/Vite): Vite HMR watches source files β†’ updates browser instantly (<1 sec)
  • Volume Mounts: Your local code is mounted into containers - saves are live!

Daily Workflow

# Start containers ONCE (typically on first boot of the day)
docker-compose up -d

# Edit code in VS Code and save - changes appear automatically!
# No docker commands needed for code changes

# Check logs to see hot reload in action
docker-compose logs -f backend   # Watch Python files reload
docker-compose logs -f frontend  # Watch Vite HMR updates

When to Rebuild

You ONLY need docker-compose build when changing:

  • βœ… Python dependencies (requirements.txt)
  • βœ… npm packages (package.json)
  • βœ… Dockerfiles (system packages, environment variables)
  • βœ… docker-compose.yml configuration
  • ❌ NOT for .py, .ts, .tsx, .css file changes - hot reload handles these!

Testing Hot Reload

# Backend test: Edit any .py file, save, and check logs
docker-compose logs -f backend
# You'll see: "WatchFiles detected changes... Reloading..."

# Frontend test: Edit any React component, save, and watch browser
# Browser updates instantly without refresh!

πŸ“‹ Application Features

βœ… Currently Implemented

  • User Authentication: JWT-based with role management and secure access
  • Business Dashboard: Comprehensive analytics dashboard with performance metrics, trends, and stakeholder insights
  • Dataset Management: Create, edit, upload and organize test datasets with direct navigation
  • Question Management: Dedicated interface for adding, editing, and bulk importing questions
  • Test Execution: Run comprehensive tests against Dialogflow agents
  • Webhook Control: Enable/disable webhooks for both Quick Test and Test Runs with per-test configuration
  • Results Analysis: View detailed test outcomes and performance metrics
  • Project Filtering: Multi-project support with Google Cloud project-based data filtering
  • Direct Routing: Navigate directly to dataset editing and question management
  • Dark Theme UI: Modern Material-UI interface with responsive design
  • Real-time Updates: Auto-refresh functionality for test runs with background polling
  • Agent URL Navigation: Corrected Google Cloud Console agent links with proper global location
  • API Documentation: Auto-generated with FastAPI
  • Infrastructure as Code: Complete Terraform management with automated deployments via GitHub Actions
  • OAuth Management: Automated OAuth secret management and environment variable handling

πŸ”§ Recent Enhancements & Bug Fixes (September 2025)

  • βœ… TestRunDetailPage UI Space Optimization (Latest - Sept 30, 2025): Consolidated all configuration sections into single collapsible accordion with two-column responsive layout
    • Unified Configuration accordion combines Test Config, Timing, Message Sequence, and Session Parameters
    • Two-column Grid layout (50/50 split on desktop, stacks on mobile) for optimal horizontal space usage
    • Left column: Test Configuration and Timing information
    • Right column: Message Sequence (Pre/Post-Prompt chips) and Session Parameters table
    • Accordion collapsed by default for minimal screen real estate usage (~70% reduction in vertical scrolling)
    • Maintains horizontal chip display for pre/post prompt messages from previous optimization
    • Responsive design automatically adapts to screen size
  • βœ… Preference System Bug Fixes (Sept 26, 2025): Fixed critical user preference restoration issues affecting dropdown loading and state persistence
  • βœ… Page Dropdown Loading Fix: Resolved timing dependency issues where page dropdowns failed to load based on logged-in user preferences on both QuickTest and CreateTestRun pages
  • βœ… Session ID Persistence: Fixed Session ID field not saving/loading properly on QuickTest page - now correctly saves all values including empty strings
  • βœ… Duplicate API Call Prevention: Eliminated race conditions causing duplicate page loading API calls and 404 errors by removing conflicting manual loadPages() calls
  • βœ… LLM Model Preference Restoration: Fixed LLM Model preferences not restoring properly when Playbook is selected on CreateTestRun page by implementing immediate save pattern
  • βœ… Preference Restoration Consistency: Standardized preference saving across QuickTest and CreateTestRun pages to use immediate onChange saves instead of complex useEffect logic
  • βœ… Debug Logging Cleanup: Removed all frontend debug console.log statements while preserving essential error handling for production readiness
  • βœ… Duplicate Preference API Calls: Fixed duplicate PUT calls to preferences API by removing conflicting useEffect hooks that duplicated immediate onChange saves
  • βœ… FastAPI Route Ordering Bug Fixes: Fixed critical routing issues where /export and /import endpoints were being interpreted as parameter IDs causing 422 validation errors
  • βœ… CSV Export Standardization: Created shared csv_utils.py module for consistent RFC 4180 compliant CSV escaping across all export functionality
  • βœ… Test Run CSV Export API: Added dedicated backend endpoint for comprehensive test run CSV export with multi-parameter evaluation breakdown
  • βœ… Authentication Token Standardization: Fixed frontend authentication to use access_token consistently across all export operations and API calls
  • βœ… Route Collision Prevention: Moved /export and /import routes before parameterized routes (/{parameter_id}) in all parameter management endpoints
  • βœ… Business Dashboard Implementation: Comprehensive analytics dashboard with overview metrics, performance trends, and agent breakdown
  • βœ… Dashboard Analytics API: Complete backend API with 5 key endpoints for business insights and performance monitoring
  • βœ… Project-Filtered Analytics: All dashboard components respect Google Cloud project selection for multi-project environments
  • βœ… Performance Metrics: Total tests, average scores, success rates, and trend analysis with time-based filtering
  • βœ… Agent Performance Breakdown: Individual agent scoring and test volume analytics with visual comparisons
  • βœ… Recent Activity Feed: Real-time test execution tracking with user attribution and timestamp display
  • βœ… Parameter Performance Analysis: Detailed breakdown of evaluation parameter effectiveness across test runs
  • βœ… Data Scope Indicators: Clear user context display showing personal vs system-wide data access
  • βœ… Modern Dashboard UI: Material-UI cards, charts, and responsive layout with dark theme consistency
  • βœ… User Permission Integration: Dashboard respects user roles (admin, test_manager, viewer) for appropriate data visibility
  • βœ… Webhook Control System: Implemented webhook enable/disable functionality for both Quick Test and Test Runs with default enabled state
  • βœ… Dialogflow API Integration: Added QueryParameters.disable_webhook support to DialogflowService with comprehensive backend implementation
  • βœ… UI Controls: Added Material-UI Switch components for webhook toggle in both QuickTestPage and CreateTestRunPage
  • βœ… Database Schema: Enhanced TestRun model with enable_webhook column and proper migration support
  • βœ… Pure Dynamic Evaluation System: Completely eliminated legacy evaluation fields - all scoring is computed from configurable parameters
  • βœ… Enhanced CSV Exports: Added comprehensive parameter breakdown exports with unlimited parameters including individual scores, weights, and reasoning
  • βœ… Computed Score Display: UI dynamically computes overall scores from parameter weights - backward compatible with legacy data but future-focused
  • βœ… Backend Schema Updates: Enhanced API responses with overall_score field and proper parameter data structures
  • βœ… Docker Deployment Improvements: Streamlined deployment process with full system prune and health checks
  • βœ… Auto-Refresh Fixed: Test runs page now properly auto-refreshes running/pending tests every 5 seconds
  • βœ… Agent URL Correction: Fixed agent links to use /locations/global/ instead of /locations/us-central1/
  • βœ… Background Polling: Implemented efficient Redux action for status updates without full page refresh
  • βœ… API Compatibility: Fixed backend API calls to handle single status filtering properly

πŸ—οΈ Architecture

GCP Infrastructure

graph TB
    subgraph "Internet"
        USER["πŸ‘€ User Browser"]
    end

    subgraph "Google Cloud Platform"
        subgraph "Firebase Hosting"
            FH["Firebase Hosting<br/>your-app.web.app<br/>(reverse proxy to Cloud Run)"]
        end

        subgraph "Cloud Run β€” Public (ingress=all)"
            FE["Frontend Service<br/>nginx + React SPA<br/>Port 8080"]
        end

        subgraph "VPC Network"
            VPC_CONN["VPC Connector"]

            subgraph "Cloud Run β€” Internal (ingress=internal)"
                BE["Backend Service<br/>FastAPI + Python 3.11<br/>Port 8080"]
            end

            subgraph "Private Services"
                DB[("Cloud SQL PostgreSQL 15")]
            end
        end

        AR["Artifact Registry<br/>Docker Images"]
    end

    subgraph "External APIs"
        DFCX["Dialogflow CX API"]
        GEMINI["Google Gemini<br/>LLM Judge"]
    end

    USER -->|"HTTPS"| FH
    FH -->|"Cloud Run rewrite"| FE
    USER -.->|"Direct access also works"| FE
    FE -->|"nginx /api/* proxy<br/>via VPC Connector<br/>(egress=all-traffic)"| BE
    BE -->|"VPC Connector"| DB
    BE -->|"HTTPS"| DFCX
    BE -->|"HTTPS"| GEMINI
    AR -.->|"Image pull"| FE
    AR -.->|"Image pull"| BE

    style FH fill:#ff9800,color:#000
    style FE fill:#2196f3,color:#fff
    style BE fill:#9c27b0,color:#fff
    style DB fill:#4caf50,color:#fff
    style VPC_CONN fill:#607d8b,color:#fff
    style AR fill:#795548,color:#fff
Loading

Key Security Design:

  • The backend is not publicly accessible (ingress=internal) β€” all API traffic flows through the frontend's nginx reverse proxy via the VPC connector
  • Both frontend and backend Cloud Run services use the VPC connector (egress=all-traffic) so that frontendβ†’backend traffic is treated as "internal" by Cloud Run
  • Firebase Hosting provides a clean URL (*.web.app) and proxies all requests to the Cloud Run frontend
  • The DNS resolver inside the VPC-connected frontend container uses 169.254.169.254 (GCE metadata server) since public DNS (8.8.8.8) is unreachable through the VPC connector

Technology Stack

  • Frontend: React 18 + TypeScript + Material-UI + Redux Toolkit
  • Backend: FastAPI + Python 3.11 + SQLAlchemy + Celery
  • Database: PostgreSQL 15
  • Reverse Proxy: nginx (Cloud Run) + Firebase Hosting (proxy)
  • Session Management: In-memory sessions (production), Redis (local development)
  • Deployment: Docker + Docker Compose + GCP Cloud Run + Firebase Hosting

Services

Local Development:
  Frontend (React)     β†’ Port 3000  (nginx proxies /api/* to backend)
  Backend (FastAPI)    β†’ Port 8000  
  Database (PostgreSQL)β†’ Port 5432
  Cache (Redis)        β†’ Port 6379

Production (GCP):
  Firebase Hosting     β†’ your-app.web.app (proxy to Cloud Run)
  Frontend (Cloud Run) β†’ nginx + React SPA, port 8080 (public)
  Backend (Cloud Run)  β†’ FastAPI, port 8080 (internal only, via VPC)
  Database (Cloud SQL)  β†’ PostgreSQL 15 (VPC-connected)

🎯 Dynamic Evaluation System

Parameter-Based Scoring

The application features an evaluation architecture that eliminates hardcoded scoring fields in favor of a fully dynamic, parameter-driven system.

System Parameters

  1. Similarity Score (Default weight: 60%) - Semantic similarity between expected and actual responses
  2. Empathy Level (Default weight: 30%) - Empathetic tone evaluation for customer service contexts
  3. No-Match Detection (Default weight: 10%) - Validates appropriate "can't help" responses

Custom Parameters

  • βœ… Unlimited Parameters: Add custom evaluation criteria (accuracy, completeness, relevance, etc.)
  • βœ… Configurable Weights: Set parameter importance from 0-100%
  • βœ… Custom Prompts: Define LLM evaluation instructions for specialized parameters
  • βœ… User-Created Parameters: Each user can create organization-specific evaluation criteria

Technical Implementation

-- Legacy (deprecated, nullable)
similarity_score: INTEGER NULL  
empathy_score: INTEGER NULL
overall_score: INTEGER NULL

-- New dynamic system (primary)
TestResultParameterScore {
  parameter_id: INTEGER (FK to EvaluationParameter)
  score: INTEGER (0-100)
  weight_used: INTEGER (0-100) 
  reasoning: TEXT
}

UI Computation

// Real-time score calculation
const overallScore = parameterScores.reduce((total, ps) => 
  total + (ps.score * ps.weight_used), 0
) / parameterScores.reduce((total, ps) => total + ps.weight_used, 0)

πŸ“ Project Structure

Dialogflow Agent Tester/
β”œβ”€β”€ .agents/                    # AI agent context and handoff docs
β”œβ”€β”€ .github/workflows/          # CI/CD pipeline configuration
β”œβ”€β”€ backend/                    # FastAPI Python backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/               # API route handlers
β”‚   β”‚   β”œβ”€β”€ core/              # Configuration and database
β”‚   β”‚   β”œβ”€β”€ models/            # SQLAlchemy models and schemas
β”‚   β”‚   β”œβ”€β”€ services/          # Business logic services
β”‚   β”‚   └── main.py           # FastAPI application entry
β”‚   β”œβ”€β”€ sql/                   # Database scripts and migrations
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ design/                     # Architecture and design documentation
β”œβ”€β”€ docs/                       # User and setup documentation
β”‚   β”œβ”€β”€ setup/                 # Setup guides (developer, GitHub, auth)
β”‚   β”œβ”€β”€ guides/                # User guides and tutorials
β”‚   └── README.md             # Documentation index
β”œβ”€β”€ frontend/                   # React TypeScript frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/        # React components
β”‚   β”‚   β”œβ”€β”€ pages/            # Page components
β”‚   β”‚   β”œβ”€β”€ store/            # Redux store and slices
β”‚   β”‚   └── App.tsx           # Main React app
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── package.json
β”œβ”€β”€ terraform/                  # Infrastructure as Code (GCP)
β”œβ”€β”€ test-data/                  # CSV files for testing
β”œβ”€β”€ docker-compose.yml          # Local development containers
β”œβ”€β”€ PRODUCTION_DEPLOYMENT.md    # Live production infrastructure details
└── README.md                  # This file - project overview

πŸ”§ Development

Check Service Status

docker-compose ps

View Logs

# All services
docker-compose logs

# Specific service
docker-compose logs backend
docker-compose logs frontend

Rebuild After Changes

# Specific service
docker-compose build backend
docker-compose up -d backend

# All services
docker-compose build
docker-compose up -d

Database Access

docker exec -it agent-evaluator-db psql -U postgres -d agent_evaluator

πŸ§ͺ Testing & Quality Assurance

Automated Testing Pipeline

  • βœ… Backend Unit Tests: 11 comprehensive tests covering CSV utilities and core functionality
  • βœ… Frontend Unit Tests: Vitest-based testing for React components and utilities
  • βœ… CI/CD Integration: Automated testing on every push to main and pull requests
  • βœ… Quality Gates: Tests must pass before deployment to production

Running Tests Locally

Backend Tests:

cd backend
python -m pytest tests/ --no-header -v

Frontend Tests:

cd frontend
npm test

All Tests:

# Backend
cd backend && python -m pytest tests/ --no-header -v

# Frontend  
cd frontend && npm test

Test Coverage

  • Backend: CSV utilities, mock infrastructure, data validation
  • Frontend: Basic functionality, component rendering, utility functions
  • Integration: API endpoints validated through CI/CD pipeline

CI/CD Pipeline Behavior

  • Documentation-only changes: Pipeline skips unnecessary builds (*.md, docs/, design/)
  • Code changes: Full test suite runs before deployment
  • Pull Requests: Tests run without deployment
  • Main branch pushes: Tests run followed by automated deployment

🌐 API Endpoints

Authentication

  • POST /auth/register - User registration
  • POST /auth/login - User login
  • GET /auth/me - Get current user

Datasets

  • GET /datasets/ - List datasets
  • POST /datasets/ - Create dataset
  • POST /datasets/{id}/upload - Upload dataset file
  • GET /datasets/{id} - Get dataset details

Test Runs

  • GET /test-runs/ - List test runs
  • POST /test-runs/ - Create test run
  • POST /test-runs/{id}/execute - Execute test run
  • GET /test-runs/{id} - Get test run details

Results

  • GET /results/ - List test results
  • GET /results/test-run/{id} - Get results for test run
  • GET /results/{id} - Get specific result

Health

  • GET /health - Service health check

πŸ” Configuration

Environment Variables

# Database
POSTGRES_SERVER=postgres
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB=agent_evaluator

# Authentication
SECRET_KEY=your-super-secret-key-change-this-in-production

# Google Cloud (for production)
GOOGLE_CLOUD_PROJECT=your-gcp-project-id

# Redis (local development only - production uses in-memory sessions)
REDIS_URL=redis://redis:6379

πŸš€ Deployment

Local Development

Using Docker Compose for local development and testing.

Sandbox Deployment - Google Cloud Platform βœ… LIVE & OPERATIONAL

βœ… Active CI/CD Pipeline: Complete GitHub Actions workflow with Workload Identity Federation
βœ… Infrastructure Deployed: Terraform-managed infrastructure on GCP
βœ… Secure Authentication: No service account keys - uses WIF for GitHub Actions
βœ… Database Operational: PostgreSQL with auto-generated secure passwords
βœ… Redis Removed: Cost optimization - removed Redis cache (~$26/month savings)
βœ… OAuth Integration: Google OAuth working with proper redirect URLs
βœ… API Endpoints: All frontend API calls use centralized service pattern

Current GCP Sandbox Architecture

  • 🌐 Firebase Hosting: https://your-frontend-url.web.app (proxy to Cloud Run frontend)
  • πŸ–₯️ Cloud Run Frontend: nginx + React SPA (ingress=all)
  • πŸš€ Cloud Run Backend: FastAPI + Python (ingress=internal, not publicly accessible)
  • πŸ—„οΈ Cloud SQL PostgreSQL: dialogflow-tester-postgres-dev with backup configuration
  • πŸ” VPC Networking: Private network with VPC connector on both frontend and backend Cloud Run services
  • πŸ”‘ Workload Identity Federation: github-actions-dialogflow@your-gcp-project-id service account
  • 🌍 Multi-Environment: Dev environment operational

Recent Infrastructure Changes (February 2026)

  • βœ… Backend Security: Backend Cloud Run set to ingress=internal β€” no longer publicly exposed on ports 80/443
  • βœ… Frontend on Cloud Run: Moved frontend from Firebase static hosting to Cloud Run with nginx reverse proxy
  • βœ… Firebase Hosting Proxy: Firebase Hosting now proxies to Cloud Run frontend (clean *.web.app URL preserved)
  • βœ… VPC Connector on Frontend: Frontend uses VPC connector (egress=all-traffic) so proxy traffic to backend is "internal"
  • βœ… Internal DNS Resolution: nginx uses 169.254.169.254 (GCE metadata DNS) since public DNS is unreachable through VPC

Previous Infrastructure Changes (September 2025)

  • βœ… Redis Removal: Eliminated Redis dependency for cost savings (~$26/month)
  • βœ… Session Management: Backend now uses in-memory sessions (suitable for single-instance)
  • βœ… OAuth Fixes: Resolved authentication redirects and token management
  • βœ… API Consistency: Fixed "Failed to construct 'URL'" errors across frontend
  • βœ… Terraform Updates: Infrastructure as code properly maintained and deployed

Deployment Status

βœ… Fully Operational:

  • Project: your-gcp-project-id
  • Backend Service: Healthy and responding
  • Frontend Application: Deployed and accessible
  • Database: Operational with secure connections
  • OAuth: Working with Google authentication

GCP Architecture & Deployment Flow

graph TB
    subgraph "CI/CD Pipeline"
        DEV["Developer Push<br/>to main branch"] --> GHA["GitHub Actions"]
        GHA --> WIF["Workload Identity Federation"]
        WIF --> SA["Service Account"]
    end

    subgraph "Build"
        SA --> BB["Backend Docker Build"]
        SA --> FB["Frontend Docker Build"]
        BB --> AR["Artifact Registry"]
        FB --> AR
    end

    subgraph "Deploy"
        SA --> TF["Terraform Apply"]
        TF --> BE_DEPLOY["Cloud Run Backend<br/>(ingress=internal)"]
        TF --> FE_DEPLOY["Cloud Run Frontend<br/>(ingress=all)"]
        TF --> DB_DEPLOY["Cloud SQL PostgreSQL"]
        TF --> VPC_DEPLOY["VPC + Connector"]
        SA --> FBH["Firebase Hosting Deploy<br/>(proxy config only)"]
    end

    subgraph "Live Services"
        FBH_LIVE["🌐 your-app.web.app"]
        FE_LIVE["πŸ–₯️ Cloud Run Frontend (nginx + React)"]
        BE_LIVE["πŸš€ Cloud Run Backend (FastAPI)"]
        DB_LIVE["πŸ—„οΈ Cloud SQL PostgreSQL"]
    end

    FBH --> FBH_LIVE
    FBH_LIVE -->|proxy| FE_LIVE
    FE_LIVE -->|nginx /api/* via VPC| BE_LIVE
    BE_LIVE -->|VPC| DB_LIVE

    style FBH_LIVE fill:#ff9800,color:#000
    style FE_LIVE fill:#2196f3,color:#fff
    style BE_LIVE fill:#9c27b0,color:#fff
    style DB_LIVE fill:#4caf50,color:#fff
Loading

Deployment Options

Option 1: Automated GitHub Actions (Recommended) βœ… READY

# Repository secrets configured:
# - WIF_PROVIDER
# - WIF_SERVICE_ACCOUNT  
# - GCP_PROJECT_ID_DEV
git add . && git commit -m "Deploy infrastructure" && git push
# Monitor deployment: https://github.com/your-org/dialogflow-test-suite/actions

Option 2: Manual Terraform Deployment βœ… AVAILABLE

cd terraform
terraform plan -var-file="terraform.tfvars.dev"
terraform apply -var-file="terraform.tfvars.dev"

Setup Guides

  • WORKLOAD_IDENTITY_SETUP_COMPLETE.md - GitHub Actions authentication setup
  • GCP_ADMIN_SETUP_GUIDE.md - GCP administrator configuration
  • .agents/deployment-guide.md - Comprehensive deployment instructions
  • GOOGLE_OAUTH_SETUP.md - OAuth application configuration

Cost Optimization

Development environment configured for <5 users with minimal resource allocation:

  • Cloud SQL: db-f1-micro (shared CPU, 0.6GB RAM)
  • Cloud Run: Pay-per-request with automatic scaling
  • Session Management: In-memory sessions (no external cache required)
  • Total Cost Savings: ~$26/month (Redis removal)

πŸ› Troubleshooting

Common Issues

Containers Won't Start

# Check Docker Desktop is running
docker-compose down
docker-compose up -d

# Check logs
docker-compose logs

API Connection Issues

If you encounter errors loading data or "Can't connect to API" messages:

  1. Check Frontend Container Logs:

    docker-compose logs frontend
  2. Check Backend Container Logs:

    docker-compose logs backend
  3. Verify API Endpoints:

  4. Test Internal Container Communication:

    # Test if backend is accessible from frontend container
    docker-compose exec frontend curl http://backend:8000/api/v1/datasets/
  5. Common API Configuration Issues:

    • ❌ Wrong: Hardcoded http://localhost:8000 in frontend API calls
    • βœ… Correct: Relative URLs (empty baseURL in axios)
    • ❌ Wrong: Missing trailing slash /api/v1/datasets β†’ causes 307 redirects
    • βœ… Correct: Proper trailing slash /api/v1/datasets/ β†’ direct 200 response

Database Connection Issues

# Reset database
docker-compose down -v
docker-compose up -d

Port Conflicts

Ensure ports 3000, 8000, 5432, and 6379 are available.

πŸ“Š Testing

Test Data

The application includes sample data structure for testing. The CSV bulk upload feature provides a dedicated page experience with column mapping capabilities:

CSV Bulk Upload Process:

  1. Navigate to any dataset's "Manage Questions" page
  2. Click "Bulk Add Questions" to open the dedicated upload page
  3. Select "Upload CSV File" mode
  4. Choose your CSV file and preview the data
  5. Map your CSV columns to question fields using the interactive interface
  6. Import questions with proper authentication and error handling

CSV File Format: Upload CSV files with the following format:

question,expected_intent,expected_entities
"What is my balance?",account.balance,"{""account_type"": ""checking""}"
"Transfer $100",money.transfer,"{""amount"": ""100""}"

HTML Content Processing: The application automatically detects HTML content in CSV files and provides intelligent processing options:

  • Smart Detection: Analyzes rows to identify HTML tags in your data
  • User Choice: Provides options to automatically strip HTML tags while preserving text content
  • Safe Processing: Uses BeautifulSoup4 for reliable HTML parsing and tag removal
  • Focused Interface: Shows HTML removal options only for selected Question and Answer columns
  • Large Dataset Support: Optimized for handling CSV files with thousands of rows efficiently

Documentation

Setup & Configuration

User Guides

Technical Documentation

🀝 Contributing

  1. Follow the existing code structure and patterns
  2. Maintain TypeScript types and Python type hints
  3. Use the Material-UI dark theme for UI consistency
  4. Test changes locally with Docker before deployment
  5. Update documentation when adding new features

Need Help? Check the comprehensive documentation in the .agents/ folder for detailed setup and troubleshooting guides.

4. Start Application

docker-compose up -d

5. Access Application

Development Setup

Backend Development

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment
cp .env.example .env
# Edit .env with your settings

# Initialize database
python app/init_db.py

# Start development server
uvicorn app.main:app --reload --port 8000

Frontend Development

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

Architecture Overview

Backend Components

app/
β”œβ”€β”€ api/           # FastAPI route handlers
β”œβ”€β”€ core/          # Configuration, database, security, migrations
β”‚   β”œβ”€β”€ migrations.py           # MigrationManager - unified orchestrator
β”‚   └── migration_files/        # Complex migration operations
β”‚       β”œβ”€β”€ add_quick_add_parameters_table.py
β”‚       └── make_evaluation_model_required.py
β”œβ”€β”€ models/        # SQLAlchemy models and Pydantic schemas
β”œβ”€β”€ services/      # Business logic (Dialogflow, LLM, testing)
└── main.py        # FastAPI application entry point

Database Migration System

The application uses a unified migration system orchestrated by MigrationManager in backend/app/core/migration_manager.py:

Architecture:

  • Single Entry Point: All migrations run automatically on application startup via MigrationManager.run_migrations()
  • Three Migration Types:
    1. Column Additions: Inline tuples in MigrationManager for simple ALTER TABLE ADD COLUMN operations
    2. Function Handlers: Complex migrations in migration_files/ (CREATE TABLE, constraints, indexes)
    3. Data Migrations: Inline SQL lists for UPDATE queries with automatic row count logging

Key Features:

  • βœ… Idempotent: Safe to run multiple times, automatically skips already-applied changes
  • βœ… Error Handling: Gracefully handles "already exists", permission errors, missing tables
  • βœ… Timeout Support: Optional timeout for long-running migrations using threading
  • βœ… Fresh Deployment: Individual migration files preserved for new environment initialization
  • βœ… Automatic Execution: Runs on every application startup, no manual intervention needed

Example Migration Patterns:

# Column Addition (inline in MigrationManager.migrations list)
{
    'name': 'add_new_columns',
    'description': 'Add new feature columns',
    'columns': [
        ('table_name', 'column_name', 'TEXT')
    ]
}

# Complex Operation (function handler from migration_files/)
{
    'name': 'create_new_table',
    'description': 'Create table with indexes',
    'type': 'function',
    'handler': create_new_table_handler,
    'timeout': 60
}

# Data Backfill (inline SQL in MigrationManager.migrations list)
{
    'name': 'backfill_field',
    'description': 'Set default values',
    'type': 'data',
    'sql': [
        "UPDATE table_name SET field = 0 WHERE field IS NULL"
    ]
}

Location: backend/app/core/migration_manager.py (orchestrator) + backend/app/core/migration_files/ (complex operations)

Frontend Components

src/
β”œβ”€β”€ components/    # Reusable UI components
β”œβ”€β”€ pages/         # Page components
β”œβ”€β”€ store/         # Redux store and slices
β”œβ”€β”€ services/      # API client and utilities
β”œβ”€β”€ types/         # TypeScript type definitions
└── hooks/         # Custom React hooks

API Endpoints

Authentication

  • POST /api/v1/auth/login - User login
  • GET /api/v1/auth/me - Get current user
  • POST /api/v1/auth/register - Register new user

Datasets

  • GET /api/v1/datasets - List datasets
  • POST /api/v1/datasets - Create dataset
  • GET /api/v1/datasets/{id} - Get dataset details
  • POST /api/v1/datasets/{id}/import - Import questions from file

Test Runs

  • GET /api/v1/tests - List test runs
  • POST /api/v1/tests - Create and start test run
  • GET /api/v1/tests/{id} - Get test run details
  • GET /api/v1/tests/{id}/results - Get test results

Dialogflow

  • GET /api/v1/dialogflow/agents - List available agents
  • GET /api/v1/dialogflow/agents/{agent}/flows - List flows
  • GET /api/v1/dialogflow/flows/{flow}/pages - List pages

Configuration

Environment Variables

Backend (.env)

# Security
SECRET_KEY=your-super-secret-key

# Database
POSTGRES_SERVER=localhost
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB=agent_evaluator

# Redis (local development only)
REDIS_URL=redis://localhost:6379

# Google Cloud
GOOGLE_CLOUD_PROJECT=your-project-id

# File Upload
UPLOAD_DIR=uploads
MAX_FILE_SIZE=52428800

# CORS
BACKEND_CORS_ORIGINS=http://localhost:3000,http://localhost:5173

Google Cloud Setup

  1. Enable APIs:

    • Dialogflow CX API
    • AI Platform API
    • Cloud Storage API (optional)
  2. Create Service Account:

    gcloud iam service-accounts create dialogflow-tester \
      --display-name="Dialogflow Agent Tester"
  3. Grant Permissions:

    gcloud projects add-iam-policy-binding PROJECT_ID \
      --member="serviceAccount:dialogflow-tester@PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/dialogflow.reader"
    
    gcloud projects add-iam-policy-binding PROJECT_ID \
      --member="serviceAccount:dialogflow-tester@PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/aiplatform.user"
  4. Download Key:

    gcloud iam service-accounts keys create service-account.json \
      --iam-account=dialogflow-tester@PROJECT_ID.iam.gserviceaccount.com

Dataset Import Format

CSV Format

The new CSV bulk upload feature provides an intuitive column mapping interface. Your CSV can have any column names - the application will let you map them to the required fields during import:

question,answer,detect_empathy,no_match,priority,tags
"How do I reset my password?","You can reset your password by...",false,false,high,"password,security"
"What is the weather like?","I can't help with weather information",false,true,low,"weather,no-match"

Column Mapping Support:

  • Required: Question and Answer columns
  • Optional: Empathy detection, No-match flag, Priority level, Tags
  • Flexible: Any CSV column names can be mapped during the import process

JSON Format

[
  {
    "question": "How do I reset my password?",
    "answer": "You can reset your password by...",
    "detect_empathy": false,
    "no_match": false,
    "priority": "high",
    "tags": ["password", "security"]
  }
]

Testing

Backend Tests

cd backend
pytest

Frontend Tests

cd frontend
npm test

Integration Tests

# Start services
docker-compose up -d

# Run integration tests
# (Add your integration test commands here)

Monitoring and Logging

Health Checks

  • Backend: GET /health
  • Database: Automated health checks in Docker Compose
  • Session Management: In-memory (production) / Redis monitoring (local development)

Logging

  • Application logs: stdout/stderr
  • Error tracking: Built-in FastAPI error handling
  • Request logging: Configurable via FastAPI middleware

Security Considerations

Production Deployment

  1. Change Default Credentials: Update admin password immediately
  2. Environment Variables: Use secure secret management
  3. HTTPS: Configure SSL/TLS certificates
  4. Database Security: Restrict database access
  5. API Rate Limiting: Implement rate limiting middleware
  6. CORS: Configure appropriate CORS origins

Authentication

  • JWT tokens with configurable expiration
  • Secure password hashing with bcrypt
  • Role-based access control
  • Session management: In-memory sessions (production), Redis sessions (local development)

Troubleshooting

Common Issues

  1. Google Cloud Authentication:

    # Verify service account key
    gcloud auth activate-service-account --key-file=service-account.json
  2. Database Connection:

    # Check PostgreSQL is running
    docker-compose ps postgres
  3. Redis Connection (Local Development):

    # Test Redis connectivity (local environment only)
    docker-compose exec redis redis-cli ping
  4. Frontend Build Issues:

    # Clear node modules and reinstall
    cd frontend
    Remove-Item -Recurse -Force node_modules, package-lock.json
    npm install

Logs

# View all logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f backend
docker-compose logs -f frontend

🚧 Current Development Status

Active Work (September 2025)

We are currently working on improving user experience and preference persistence:

Recently Completed

  • βœ… Flows API Fixes: Resolved 500 errors in Dialogflow flows endpoint
  • βœ… Batch Size Preferences: Complete persistence implementation for test run batch sizes
  • βœ… Infinite Loop Fixes: Eliminated useEffect dependency cycles causing re-rendering issues
  • βœ… Enhanced Debugging: Comprehensive logging system for preference management

Contributing

Development Workflow

  1. Create feature branch
  2. Make changes with tests
  3. Submit pull request

Code Standards

  • Backend: Black formatter, Pylint, type hints
  • Frontend: ESLint, Prettier, TypeScript strict mode
  • Commits: Conventional commit messages

About

A test harness for examining Dialogflow Agent responses

Topics

Resources

Contributing

Stars

Watchers

Forks

Contributors