Skip to content

dkoul/bot-test-playwright

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Chatbot Test Suite

A comprehensive Playwright-based test suite for testing LLM-powered chatbots with RAG (Retrieval-Augmented Generation) capabilities. This suite implements RAGAS evaluation metrics and includes extensive security testing for prompt injection, jailbreak attempts, and Llama Guard policy enforcement.

🎯 Features

  • RAGAS Compliance Testing: Answer relevance, context relevance, groundedness, and faithfulness
  • Security Testing: Prompt injection, jailbreak attempts, and malicious content detection
  • Llama Guard Enforcement: Policy compliance and content filtering validation
  • Performance Testing: Response time and latency benchmarks
  • Multi-turn Conversation Testing: Context management and conversation flow
  • UX Testing: Interface functionality and user experience validation

πŸ—οΈ Project Structure

tests/
β”œβ”€β”€ ragas/
β”‚   β”œβ”€β”€ answer-relevance.spec.ts    # RAGAS answer relevance tests
β”‚   └── groundedness.spec.ts        # RAGAS groundedness tests
β”œβ”€β”€ security/
β”‚   β”œβ”€β”€ prompt-injection.spec.ts    # Prompt injection attack tests
β”‚   └── llama-guard.spec.ts         # Llama Guard policy tests
β”œβ”€β”€ retrieval/
β”‚   └── latency.spec.ts             # Response time and latency tests
β”œβ”€β”€ functional/
β”‚   β”œβ”€β”€ multistep-convo.spec.ts     # Multi-turn conversation tests
β”‚   └── ux.spec.ts                  # UI/UX functionality tests
└── utils/
    β”œβ”€β”€ test-helpers.ts             # Common utilities and helpers
    β”œβ”€β”€ global-setup.ts             # Global test setup
    └── global-teardown.ts          # Global test cleanup

πŸš€ Getting Started

Prerequisites

  • Node.js 18+
  • npm or yarn
  • A running LLM chatbot instance

Installation

  1. Clone the repository:
git clone <repository-url>
cd llm-chatbot-test-suite
  1. Install dependencies:
npm install
  1. Install Playwright browsers:
npm run install-browsers
  1. Set up your environment:
export CHATBOT_URL=http://localhost:3000  # Your chatbot URL

Running Tests

# Run all tests
npm test

# Run specific test suites
npm run test:ragas      # RAGAS compliance tests
npm run test:security   # Security and prompt injection tests
npm run test:retrieval  # Retrieval and performance tests
npm run test:functional # Functional and UX tests

# Run with UI (interactive mode)
npm run test:ui

# Run in headed mode (see browser)
npm run test:headed

# Debug mode
npm run test:debug

πŸ“Š Test Categories

1. RAGAS Compliance Tests

Tests based on the RAGAS framework:

  • Answer Relevance: Validates responses are topically relevant to questions
  • Context Relevance: Ensures retrieved context is relevant to queries
  • Groundedness: Verifies claims are traceable to retrieved sources
  • Faithfulness: Detects hallucinations and unsupported claims

2. Security Tests

Comprehensive security validation:

  • Prompt Injection: Tests for basic and advanced injection attempts
  • Jailbreak Attempts: Validates defenses against restriction bypass
  • Role-Playing Attacks: Tests resistance to social engineering
  • Multi-turn Injection: Validates security across conversation turns
  • Encoded Injection: Tests for obfuscated attack vectors

3. Llama Guard Tests

Policy enforcement and content filtering:

  • Toxic Content Filtering: Validates harmful content detection
  • Harassment Detection: Tests abuse and harassment filtering
  • Malicious Code Blocking: Ensures security exploit prevention
  • Privacy Protection: Validates data protection measures
  • Content Policy Compliance: Tests adherence to usage policies

4. Performance Tests

Response time and latency validation:

  • Response Time Benchmarks: Validates responses within acceptable time limits
  • Load Consistency: Tests performance under multiple requests
  • Retrieval Latency: Measures RAG retrieval performance

5. Functional Tests

End-to-end functionality validation:

  • Multi-turn Conversations: Context management and conversation flow
  • Topic Switching: Validates graceful topic transitions
  • Clarification Handling: Tests follow-up question processing
  • UI/UX Testing: Interface functionality and user experience

πŸ› οΈ Configuration

Environment Variables

# Required
CHATBOT_URL=http://localhost:3000  # Your chatbot URL

# Optional
CI=true                            # Enable CI mode (affects retries/parallelism)

Playwright Configuration

Key settings in playwright.config.ts:

  • Timeout: 60 seconds (for LLM response times)
  • Retries: 2 attempts on CI, 0 locally
  • Browsers: Chrome, Firefox, Safari (desktop + mobile)
  • Screenshots: On failure
  • Video: On failure
  • Trace: On retry

πŸ“‹ Test Data

The test suite includes comprehensive test cases for:

RAGAS Test Cases

  • Product feature queries
  • Technical documentation questions
  • Integration and API questions
  • Support and contact information

Security Test Cases

  • Basic prompt injection patterns
  • Context-based injection attempts
  • Role-playing social engineering
  • Multi-turn attack sequences
  • Encoded and obfuscated attacks

Llama Guard Test Cases

  • Toxic and abusive content
  • Harassment and bullying
  • Malicious code requests
  • Privacy violation attempts
  • Policy compliance violations

πŸ”§ Customization

Adding New Test Cases

  1. RAGAS Tests: Add new question patterns in tests/ragas/
  2. Security Tests: Add new attack vectors in tests/security/
  3. Custom Helpers: Extend utilities in tests/utils/test-helpers.ts

Modifying Selectors

Update SELECTORS in tests/utils/test-helpers.ts to match your chatbot's HTML structure:

export const SELECTORS = {
  chatInput: '[data-testid="chat-input"]',  // Your input selector
  sendButton: '[data-testid="send-button"]', // Your send button selector
  botResponse: '[data-testid="bot-message"]', // Your bot response selector
  // ... other selectors
};

πŸ“ˆ Reporting

The test suite generates multiple report formats:

  • HTML Report: Interactive report with screenshots and traces
  • JSON Report: Machine-readable results for CI integration
  • JUnit XML: For CI/CD pipeline integration

View reports:

npm run report  # Opens HTML report in browser

πŸ” Debugging

Debug Individual Tests

# Debug specific test file
npx playwright test tests/security/prompt-injection.spec.ts --debug

# Debug specific test case
npx playwright test -g "should block basic injection" --debug

Trace Viewer

# View trace for failed tests
npx playwright show-trace test-results/trace.zip

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments


πŸ“ž Support

For issues or questions:

  1. Check the Issues page
  2. Review the Documentation
  3. Contact the maintainers

Happy Testing! 🎭

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published