A comprehensive Playwright-based test suite for testing LLM-powered chatbots with RAG (Retrieval-Augmented Generation) capabilities. This suite implements RAGAS evaluation metrics and includes extensive security testing for prompt injection, jailbreak attempts, and Llama Guard policy enforcement.
- RAGAS Compliance Testing: Answer relevance, context relevance, groundedness, and faithfulness
- Security Testing: Prompt injection, jailbreak attempts, and malicious content detection
- Llama Guard Enforcement: Policy compliance and content filtering validation
- Performance Testing: Response time and latency benchmarks
- Multi-turn Conversation Testing: Context management and conversation flow
- UX Testing: Interface functionality and user experience validation
tests/
βββ ragas/
β βββ answer-relevance.spec.ts # RAGAS answer relevance tests
β βββ groundedness.spec.ts # RAGAS groundedness tests
βββ security/
β βββ prompt-injection.spec.ts # Prompt injection attack tests
β βββ llama-guard.spec.ts # Llama Guard policy tests
βββ retrieval/
β βββ latency.spec.ts # Response time and latency tests
βββ functional/
β βββ multistep-convo.spec.ts # Multi-turn conversation tests
β βββ ux.spec.ts # UI/UX functionality tests
βββ utils/
βββ test-helpers.ts # Common utilities and helpers
βββ global-setup.ts # Global test setup
βββ global-teardown.ts # Global test cleanup
- Node.js 18+
- npm or yarn
- A running LLM chatbot instance
- Clone the repository:
git clone <repository-url>
cd llm-chatbot-test-suite- Install dependencies:
npm install- Install Playwright browsers:
npm run install-browsers- Set up your environment:
export CHATBOT_URL=http://localhost:3000 # Your chatbot URL# Run all tests
npm test
# Run specific test suites
npm run test:ragas # RAGAS compliance tests
npm run test:security # Security and prompt injection tests
npm run test:retrieval # Retrieval and performance tests
npm run test:functional # Functional and UX tests
# Run with UI (interactive mode)
npm run test:ui
# Run in headed mode (see browser)
npm run test:headed
# Debug mode
npm run test:debugTests based on the RAGAS framework:
- Answer Relevance: Validates responses are topically relevant to questions
- Context Relevance: Ensures retrieved context is relevant to queries
- Groundedness: Verifies claims are traceable to retrieved sources
- Faithfulness: Detects hallucinations and unsupported claims
Comprehensive security validation:
- Prompt Injection: Tests for basic and advanced injection attempts
- Jailbreak Attempts: Validates defenses against restriction bypass
- Role-Playing Attacks: Tests resistance to social engineering
- Multi-turn Injection: Validates security across conversation turns
- Encoded Injection: Tests for obfuscated attack vectors
Policy enforcement and content filtering:
- Toxic Content Filtering: Validates harmful content detection
- Harassment Detection: Tests abuse and harassment filtering
- Malicious Code Blocking: Ensures security exploit prevention
- Privacy Protection: Validates data protection measures
- Content Policy Compliance: Tests adherence to usage policies
Response time and latency validation:
- Response Time Benchmarks: Validates responses within acceptable time limits
- Load Consistency: Tests performance under multiple requests
- Retrieval Latency: Measures RAG retrieval performance
End-to-end functionality validation:
- Multi-turn Conversations: Context management and conversation flow
- Topic Switching: Validates graceful topic transitions
- Clarification Handling: Tests follow-up question processing
- UI/UX Testing: Interface functionality and user experience
# Required
CHATBOT_URL=http://localhost:3000 # Your chatbot URL
# Optional
CI=true # Enable CI mode (affects retries/parallelism)Key settings in playwright.config.ts:
- Timeout: 60 seconds (for LLM response times)
- Retries: 2 attempts on CI, 0 locally
- Browsers: Chrome, Firefox, Safari (desktop + mobile)
- Screenshots: On failure
- Video: On failure
- Trace: On retry
The test suite includes comprehensive test cases for:
- Product feature queries
- Technical documentation questions
- Integration and API questions
- Support and contact information
- Basic prompt injection patterns
- Context-based injection attempts
- Role-playing social engineering
- Multi-turn attack sequences
- Encoded and obfuscated attacks
- Toxic and abusive content
- Harassment and bullying
- Malicious code requests
- Privacy violation attempts
- Policy compliance violations
- RAGAS Tests: Add new question patterns in
tests/ragas/ - Security Tests: Add new attack vectors in
tests/security/ - Custom Helpers: Extend utilities in
tests/utils/test-helpers.ts
Update SELECTORS in tests/utils/test-helpers.ts to match your chatbot's HTML structure:
export const SELECTORS = {
chatInput: '[data-testid="chat-input"]', // Your input selector
sendButton: '[data-testid="send-button"]', // Your send button selector
botResponse: '[data-testid="bot-message"]', // Your bot response selector
// ... other selectors
};The test suite generates multiple report formats:
- HTML Report: Interactive report with screenshots and traces
- JSON Report: Machine-readable results for CI integration
- JUnit XML: For CI/CD pipeline integration
View reports:
npm run report # Opens HTML report in browser# Debug specific test file
npx playwright test tests/security/prompt-injection.spec.ts --debug
# Debug specific test case
npx playwright test -g "should block basic injection" --debug# View trace for failed tests
npx playwright show-trace test-results/trace.zip- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- RAGAS - RAG evaluation framework
- Playwright - End-to-end testing framework
- Llama Guard - Content safety classification
For issues or questions:
- Check the Issues page
- Review the Documentation
- Contact the maintainers
Happy Testing! π