adds prompt testing using my fork of testcontainers by Josephrp · Pull Request #28 · DeepCritical/DeepCritical

Josephrp · 2025-10-05T04:12:38Z

Pull Request

Description

Type of Change

Component

Related Issues

Fixes #
Closes #
Related to [PERF]: Refactor Prompts into src/prompts folder #6 [PERF]: Test coverage should be 100% #5

Changes Made

Testing

I have tested these changes locally
I have added/updated tests for my changes
All existing tests pass
I have tested with different configurations
I have tested with different flows (PRIME, Bioinformatics, DeepSearch, etc.)

Test Configuration

# Example test command
uv run deepresearch question="..." app_mode=single_react

Configuration Changes

No configuration changes
Added new configuration options
Modified existing configuration
Removed configuration options

Summary

I have successfully created a comprehensive Hydra-configurable VLLM testing system for all DeepCritical prompts, optimized for single instance usage and time efficiency. Here's what has been implemented:

🏗️ Core Infrastructure

Hydra Configuration System (configs/vllm_tests/)
- Main configuration (default.yaml) with comprehensive VLLM test settings
- Model configuration (model/local_model.yaml) for container and generation settings
- Performance configuration (performance/balanced.yaml) for execution optimization
- Testing configuration (testing/comprehensive.yaml) for test validation and assertions
- Output configuration (output/structured.yaml) for artifact management
Enhanced VLLMPromptTester (tests/testcontainers_vllm.py)
- Full Hydra configuration support with fallback defaults
- Single instance optimization with configurable resource limits
- Enhanced error handling and retry logic
- Configurable dummy data generation strategies
- Advanced validation and monitoring capabilities
Hydra-Integrated Base Test Class (tests/test_prompts_vllm_base.py)
- Configuration-aware test execution
- Module filtering and prompt limiting based on config
- Enhanced assertion methods using configuration values
- Performance monitoring and execution time tracking

🧪 Test Files Created

Individual test files for each prompt module (20+ files):

✅ All prompt modules now have dedicated VLLM test files
✅ Each test file uses the base class with Hydra configuration
✅ Tests are properly marked as optional and VLLM-specific

⚙️ CI/CD Configuration

Pytest Configuration (pytest.ini)
- Added VLLM and optional markers
- Configured to skip VLLM tests by default
GitHub Actions (.github/workflows/ci.yml)
- Enhanced with Hydra dependencies installation
- VLLM tests run only on manual trigger or special commit messages
- Proper timeout and resource management
Tox Configuration (tox.ini)
- Separate environments for basic tests and VLLM tests
- Hydra configuration support
- Proper dependency management

🛠️ Utilities

Enhanced Test Runner (scripts/run_vllm_tests.py)
- Full Hydra configuration support with command-line overrides
- Module filtering and listing capabilities
- Single instance optimization enforcement
- Comprehensive error handling and reporting
Updated Documentation (VLLM_TESTS_README.md)
- Complete guide for Hydra-based VLLM testing
- Configuration examples and troubleshooting
- Single instance optimization details
- CI/CD integration instructions

🎯 Key Optimizations

Single Instance Usage: All tests use one VLLM container for maximum efficiency
Sequential Execution: Tests run sequentially to avoid container conflicts
Configurable Resource Limits: CPU, memory, and timeout limits through Hydra
Smart Batching: Module-level batching for efficient execution
Enhanced Error Handling: Graceful degradation and retry mechanisms

🚀 Usage Examples

# Run all VLLM tests with Hydra configuration
python scripts/run_vllm_tests.py

# Run specific modules
python scripts/run_vllm_tests.py agents bioinformatics_agents

# Custom configuration
python scripts/run_vllm_tests.py --config-name vllm_tests model.name=microsoft/DialoGPT-large

# List available modules
python scripts/run_vllm_tests.py --list-modules

# CI mode (no Hydra, single instance)
python scripts/run_vllm_tests.py --no-hydra

📊 Configuration Structure

The system now supports comprehensive configuration through Hydra:

# Main VLLM test configuration
vllm_tests:
  enabled: true
  run_in_ci: false
  execution_strategy: sequential
  max_concurrent_tests: 1  # Single instance optimization

# Model configuration
model:
  name: "microsoft/DialoGPT-medium"
  generation:
    max_tokens: 256
    temperature: 0.7

# Performance optimization
performance:
  max_container_startup_time: 120
  max_execution_time_per_module: 300

# Testing parameters
testing:
  scope:
    test_all_modules: true
    max_prompts_per_module: 50
  validation:
    validate_prompt_structure: true
    validate_response_structure: true

📈 Benefits

Time Optimization: Single VLLM instance reduces startup overhead and improves test speed
Resource Efficiency: Lower memory and CPU usage compared to multiple containers
Configuration Flexibility: Full Hydra configuration support for all test parameters
CI Safety: Tests are disabled in CI by default and only run when explicitly triggered
Maintainability: Centralized configuration makes it easy to adjust test behavior
Extensibility: Easy to add new prompt modules or modify test behavior through config

The VLLM testing system is now fully configurable through Hydra, optimized for single instance usage, and ready for production use with comprehensive documentation and CI/CD integration!

Documentation

Performance Impact

No performance impact
Performance improvement
Performance regression (explain below)

Performance Details

Execution time:
Memory usage:
Other metrics:

Breaking Changes

No breaking changes
Breaking change (describe below)

Migration Guide

Checklist

My code follows the project's style guidelines
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
[x ] New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published

Additional Notes

Screenshots/Output

Before

After

Reviewer Notes

adds prompt testing using my fork of testcontainers

dd880b2

Josephrp added this to the Test Coverage 100% milestone Oct 5, 2025

Josephrp self-assigned this Oct 5, 2025

Josephrp added this to Deep Critical Project Boards Oct 5, 2025

Josephrp added enhancement New feature or request help wanted Extra attention is needed labels Oct 5, 2025

Josephrp and others added 2 commits October 5, 2025 13:10

adds tests , testscontainers , vllm object , scripts

6464176

Merge branch 'dev' into perf/addsprompt-testing

f014f7f

Josephrp enabled auto-merge October 5, 2025 11:14

Josephrp requested a review from MarioAderman October 5, 2025 11:15

Josephrp assigned Josephrp and unassigned Josephrp Oct 5, 2025

Josephrp moved this to Done in Deep Critical Project Boards Oct 5, 2025

Josephrp linked an issue Oct 5, 2025 that may be closed by this pull request

[PERF]: Test coverage should be 100% #5

Open

3 tasks

Josephrp disabled auto-merge October 5, 2025 16:25

Josephrp merged commit 6b1ddec into dev Oct 5, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adds prompt testing using my fork of testcontainers#28

adds prompt testing using my fork of testcontainers#28
Josephrp merged 3 commits intodevfrom
perf/addsprompt-testing

Josephrp commented Oct 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Josephrp commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of Change

Component

Related Issues

Changes Made

Testing

Test Configuration

Configuration Changes

Summary

🏗️ Core Infrastructure

🧪 Test Files Created

⚙️ CI/CD Configuration

🛠️ Utilities

🎯 Key Optimizations

🚀 Usage Examples

📊 Configuration Structure

📈 Benefits

Documentation

Performance Impact

Performance Details

Breaking Changes

Migration Guide

Checklist

Additional Notes

Screenshots/Output

Before

After

Reviewer Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Josephrp commented Oct 5, 2025 •

edited

Loading