Skip to content

adds prompt testing using my fork of testcontainers#28

Merged
Josephrp merged 3 commits intodevfrom
perf/addsprompt-testing
Oct 5, 2025
Merged

adds prompt testing using my fork of testcontainers#28
Josephrp merged 3 commits intodevfrom
perf/addsprompt-testing

Conversation

@Josephrp
Copy link
Copy Markdown
Collaborator

@Josephrp Josephrp commented Oct 5, 2025

Pull Request

Description

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • ⚡ Performance improvement
  • 🧹 Code refactoring
  • 🧪 Test addition or update
  • 🔧 Configuration change
  • 🧬 Bioinformatics enhancement
  • 🔄 Workflow improvement

Component

  • Core Workflow Engine
  • PRIME Flow (Protein Engineering)
  • Bioinformatics Flow (Data Fusion)
  • DeepSearch Flow (Web Research)
  • Challenge Flow (Experimental)
  • Tool Registry
  • Agent System
  • Configuration (Hydra)
  • Pydantic Graph
  • Documentation
  • Tests
  • Other:

Related Issues

Changes Made

Testing

  • I have tested these changes locally
  • I have added/updated tests for my changes
  • All existing tests pass
  • I have tested with different configurations
  • I have tested with different flows (PRIME, Bioinformatics, DeepSearch, etc.)

Test Configuration

# Example test command
uv run deepresearch question="..." app_mode=single_react

Configuration Changes

  • No configuration changes
  • Added new configuration options
  • Modified existing configuration
  • Removed configuration options

Summary

I have successfully created a comprehensive Hydra-configurable VLLM testing system for all DeepCritical prompts, optimized for single instance usage and time efficiency. Here's what has been implemented:

🏗️ Core Infrastructure

  1. Hydra Configuration System (configs/vllm_tests/)

    • Main configuration (default.yaml) with comprehensive VLLM test settings
    • Model configuration (model/local_model.yaml) for container and generation settings
    • Performance configuration (performance/balanced.yaml) for execution optimization
    • Testing configuration (testing/comprehensive.yaml) for test validation and assertions
    • Output configuration (output/structured.yaml) for artifact management
  2. Enhanced VLLMPromptTester (tests/testcontainers_vllm.py)

    • Full Hydra configuration support with fallback defaults
    • Single instance optimization with configurable resource limits
    • Enhanced error handling and retry logic
    • Configurable dummy data generation strategies
    • Advanced validation and monitoring capabilities
  3. Hydra-Integrated Base Test Class (tests/test_prompts_vllm_base.py)

    • Configuration-aware test execution
    • Module filtering and prompt limiting based on config
    • Enhanced assertion methods using configuration values
    • Performance monitoring and execution time tracking

🧪 Test Files Created

Individual test files for each prompt module (20+ files):

  • ✅ All prompt modules now have dedicated VLLM test files
  • ✅ Each test file uses the base class with Hydra configuration
  • ✅ Tests are properly marked as optional and VLLM-specific

⚙️ CI/CD Configuration

  1. Pytest Configuration (pytest.ini)

    • Added VLLM and optional markers
    • Configured to skip VLLM tests by default
  2. GitHub Actions (.github/workflows/ci.yml)

    • Enhanced with Hydra dependencies installation
    • VLLM tests run only on manual trigger or special commit messages
    • Proper timeout and resource management
  3. Tox Configuration (tox.ini)

    • Separate environments for basic tests and VLLM tests
    • Hydra configuration support
    • Proper dependency management

🛠️ Utilities

  1. Enhanced Test Runner (scripts/run_vllm_tests.py)

    • Full Hydra configuration support with command-line overrides
    • Module filtering and listing capabilities
    • Single instance optimization enforcement
    • Comprehensive error handling and reporting
  2. Updated Documentation (VLLM_TESTS_README.md)

    • Complete guide for Hydra-based VLLM testing
    • Configuration examples and troubleshooting
    • Single instance optimization details
    • CI/CD integration instructions

🎯 Key Optimizations

  • Single Instance Usage: All tests use one VLLM container for maximum efficiency
  • Sequential Execution: Tests run sequentially to avoid container conflicts
  • Configurable Resource Limits: CPU, memory, and timeout limits through Hydra
  • Smart Batching: Module-level batching for efficient execution
  • Enhanced Error Handling: Graceful degradation and retry mechanisms

🚀 Usage Examples

# Run all VLLM tests with Hydra configuration
python scripts/run_vllm_tests.py

# Run specific modules
python scripts/run_vllm_tests.py agents bioinformatics_agents

# Custom configuration
python scripts/run_vllm_tests.py --config-name vllm_tests model.name=microsoft/DialoGPT-large

# List available modules
python scripts/run_vllm_tests.py --list-modules

# CI mode (no Hydra, single instance)
python scripts/run_vllm_tests.py --no-hydra

📊 Configuration Structure

The system now supports comprehensive configuration through Hydra:

# Main VLLM test configuration
vllm_tests:
  enabled: true
  run_in_ci: false
  execution_strategy: sequential
  max_concurrent_tests: 1  # Single instance optimization

# Model configuration
model:
  name: "microsoft/DialoGPT-medium"
  generation:
    max_tokens: 256
    temperature: 0.7

# Performance optimization
performance:
  max_container_startup_time: 120
  max_execution_time_per_module: 300

# Testing parameters
testing:
  scope:
    test_all_modules: true
    max_prompts_per_module: 50
  validation:
    validate_prompt_structure: true
    validate_response_structure: true

📈 Benefits

  1. Time Optimization: Single VLLM instance reduces startup overhead and improves test speed
  2. Resource Efficiency: Lower memory and CPU usage compared to multiple containers
  3. Configuration Flexibility: Full Hydra configuration support for all test parameters
  4. CI Safety: Tests are disabled in CI by default and only run when explicitly triggered
  5. Maintainability: Centralized configuration makes it easy to adjust test behavior
  6. Extensibility: Easy to add new prompt modules or modify test behavior through config

The VLLM testing system is now fully configurable through Hydra, optimized for single instance usage, and ready for production use with comprehensive documentation and CI/CD integration!

Documentation

  • No documentation changes needed
  • Updated README
  • Updated API documentation
  • Updated configuration documentation
  • Added code comments
  • Updated examples

Performance Impact

  • No performance impact
  • Performance improvement
  • Performance regression (explain below)

Performance Details

  • Execution time:
  • Memory usage:
  • Other metrics:

Breaking Changes

  • No breaking changes
  • Breaking change (describe below)

Migration Guide

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • [x ] New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Additional Notes

Screenshots/Output

Before

After

Reviewer Notes

@Josephrp Josephrp added this to the Test Coverage 100% milestone Oct 5, 2025
@Josephrp Josephrp self-assigned this Oct 5, 2025
@Josephrp Josephrp added enhancement New feature or request help wanted Extra attention is needed labels Oct 5, 2025
@Josephrp Josephrp enabled auto-merge October 5, 2025 11:14
@Josephrp Josephrp requested a review from MarioAderman October 5, 2025 11:15
@Josephrp Josephrp assigned Josephrp and unassigned Josephrp Oct 5, 2025
@Josephrp Josephrp linked an issue Oct 5, 2025 that may be closed by this pull request
3 tasks
@Josephrp Josephrp disabled auto-merge October 5, 2025 16:25
@Josephrp Josephrp merged commit 6b1ddec into dev Oct 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[PERF]: Test coverage should be 100%

1 participant