LLM Chatbot Test Suite

A comprehensive Playwright-based test suite for testing LLM-powered chatbots with RAG (Retrieval-Augmented Generation) capabilities. This suite implements RAGAS evaluation metrics and includes extensive security testing for prompt injection, jailbreak attempts, and Llama Guard policy enforcement.

🎯 Features

RAGAS Compliance Testing: Answer relevance, context relevance, groundedness, and faithfulness
Security Testing: Prompt injection, jailbreak attempts, and malicious content detection
Llama Guard Enforcement: Policy compliance and content filtering validation
Performance Testing: Response time and latency benchmarks
Multi-turn Conversation Testing: Context management and conversation flow
UX Testing: Interface functionality and user experience validation

🏗️ Project Structure

tests/
├── ragas/
│   ├── answer-relevance.spec.ts    # RAGAS answer relevance tests
│   └── groundedness.spec.ts        # RAGAS groundedness tests
├── security/
│   ├── prompt-injection.spec.ts    # Prompt injection attack tests
│   └── llama-guard.spec.ts         # Llama Guard policy tests
├── retrieval/
│   └── latency.spec.ts             # Response time and latency tests
├── functional/
│   ├── multistep-convo.spec.ts     # Multi-turn conversation tests
│   └── ux.spec.ts                  # UI/UX functionality tests
└── utils/
    ├── test-helpers.ts             # Common utilities and helpers
    ├── global-setup.ts             # Global test setup
    └── global-teardown.ts          # Global test cleanup

🚀 Getting Started

Prerequisites

Node.js 18+
npm or yarn
A running LLM chatbot instance

Installation

Clone the repository:

git clone <repository-url>
cd llm-chatbot-test-suite

Install dependencies:

npm install

Install Playwright browsers:

npm run install-browsers

Set up your environment:

export CHATBOT_URL=http://localhost:3000  # Your chatbot URL

Running Tests

# Run all tests
npm test

# Run specific test suites
npm run test:ragas      # RAGAS compliance tests
npm run test:security   # Security and prompt injection tests
npm run test:retrieval  # Retrieval and performance tests
npm run test:functional # Functional and UX tests

# Run with UI (interactive mode)
npm run test:ui

# Run in headed mode (see browser)
npm run test:headed

# Debug mode
npm run test:debug

📊 Test Categories

1. RAGAS Compliance Tests

Tests based on the RAGAS framework:

Answer Relevance: Validates responses are topically relevant to questions
Context Relevance: Ensures retrieved context is relevant to queries
Groundedness: Verifies claims are traceable to retrieved sources
Faithfulness: Detects hallucinations and unsupported claims

2. Security Tests

Comprehensive security validation:

Prompt Injection: Tests for basic and advanced injection attempts
Jailbreak Attempts: Validates defenses against restriction bypass
Role-Playing Attacks: Tests resistance to social engineering
Multi-turn Injection: Validates security across conversation turns
Encoded Injection: Tests for obfuscated attack vectors

3. Llama Guard Tests

Policy enforcement and content filtering:

Toxic Content Filtering: Validates harmful content detection
Harassment Detection: Tests abuse and harassment filtering
Malicious Code Blocking: Ensures security exploit prevention
Privacy Protection: Validates data protection measures
Content Policy Compliance: Tests adherence to usage policies

4. Performance Tests

Response time and latency validation:

Response Time Benchmarks: Validates responses within acceptable time limits
Load Consistency: Tests performance under multiple requests
Retrieval Latency: Measures RAG retrieval performance

5. Functional Tests

End-to-end functionality validation:

Multi-turn Conversations: Context management and conversation flow
Topic Switching: Validates graceful topic transitions
Clarification Handling: Tests follow-up question processing
UI/UX Testing: Interface functionality and user experience

🛠️ Configuration

Environment Variables

# Required
CHATBOT_URL=http://localhost:3000  # Your chatbot URL

# Optional
CI=true                            # Enable CI mode (affects retries/parallelism)

Playwright Configuration

Key settings in playwright.config.ts:

Timeout: 60 seconds (for LLM response times)
Retries: 2 attempts on CI, 0 locally
Browsers: Chrome, Firefox, Safari (desktop + mobile)
Screenshots: On failure
Video: On failure
Trace: On retry

📋 Test Data

The test suite includes comprehensive test cases for:

RAGAS Test Cases

Product feature queries
Technical documentation questions
Integration and API questions
Support and contact information

Security Test Cases

Basic prompt injection patterns
Context-based injection attempts
Role-playing social engineering
Multi-turn attack sequences
Encoded and obfuscated attacks

Llama Guard Test Cases

Toxic and abusive content
Harassment and bullying
Malicious code requests
Privacy violation attempts
Policy compliance violations

🔧 Customization

Adding New Test Cases

RAGAS Tests: Add new question patterns in tests/ragas/
Security Tests: Add new attack vectors in tests/security/
Custom Helpers: Extend utilities in tests/utils/test-helpers.ts

Modifying Selectors

Update SELECTORS in tests/utils/test-helpers.ts to match your chatbot's HTML structure:

export const SELECTORS = {
  chatInput: '[data-testid="chat-input"]',  // Your input selector
  sendButton: '[data-testid="send-button"]', // Your send button selector
  botResponse: '[data-testid="bot-message"]', // Your bot response selector
  // ... other selectors
};

📈 Reporting

The test suite generates multiple report formats:

HTML Report: Interactive report with screenshots and traces
JSON Report: Machine-readable results for CI integration
JUnit XML: For CI/CD pipeline integration

View reports:

npm run report  # Opens HTML report in browser

🔍 Debugging

Debug Individual Tests

# Debug specific test file
npx playwright test tests/security/prompt-injection.spec.ts --debug

# Debug specific test case
npx playwright test -g "should block basic injection" --debug

Trace Viewer

# View trace for failed tests
npx playwright show-trace test-results/trace.zip

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

RAGAS - RAG evaluation framework
Playwright - End-to-end testing framework
Llama Guard - Content safety classification

📞 Support

For issues or questions:

Check the Issues page
Review the Documentation
Contact the maintainers

Happy Testing! 🎭

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tests		tests
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
tsconfig.json		tsconfig.json

dkoul/bot-test-playwright

Folders and files

Latest commit

History

Repository files navigation