Thank you for your interest in contributing to StackWise! This document provides guidelines and information for contributors.
StackWise aims to democratize large model training by enabling 70B parameter models to be trained on a single H200 GPU through revolutionary layer-wise progressive training with bidirectional attention and mask-diffusion objectives.
- Python 3.8+
- PyTorch 2.1+
- CUDA 12+ (for GPU training)
- Git
# Clone the repository
git clone https://github.com/mlsquare/stack-wise.git
cd stack-wise
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Install development dependencies
uv pip install -e .[advanced,dev]
# Run tests to verify setup
python tests/run_tests.py-
Core Architecture
- Block-Stack-Rack paradigm improvements
- Layer-wise training optimizations
- Memory efficiency enhancements
-
Training Modes
- Stack-wise training implementation
- Progressive curriculum learning
- Left-to-right vs right-to-left strategies
-
Unified Training Objectives
- MLM/CLM unification improvements
- Mask-diffusion objective enhancements
- Attention mechanism optimizations
-
Memory Optimization
- Activation caching improvements
- Quantization support (FP4, FP8, FP16)
- QLoRA integration enhancements
-
Curriculum Learning
- New curriculum strategies
- Semantic preservation techniques
- Progressive model building
-
Attention Mechanisms
- GQA, MLA, and kernel-based attention
- Bidirectional vs causal attention
- Modern attention variants
-
Diffusion Training
- Time-as-depth training
- Masked-diffusion objectives
- Progressive masking schedules
# Fork the repository on GitHub
# Clone your fork
git clone https://github.com/YOUR_USERNAME/stack-wise.git
cd stack-wise
# Add upstream remote
git remote add upstream https://github.com/mlsquare/stack-wise.git# Create and switch to feature branch
git checkout -b feature/your-feature-name
# Or for bug fixes
git checkout -b fix/issue-description- Follow the coding standards (see below)
- Add tests for new functionality
- Update documentation as needed
- Ensure all tests pass
# Run all tests
python tests/run_tests.py
# Run specific test types
python tests/run_tests.py --unit
python tests/run_tests.py --integration
python tests/run_tests.py --examples
# Run with verbose output
python tests/run_tests.py --verbose# Stage your changes
git add .
# Commit with descriptive message
git commit -m "feat: add new curriculum learning strategy"
# Push to your fork
git push origin feature/your-feature-name- Go to your fork on GitHub
- Click "New Pull Request"
- Fill out the PR template
- Request review from maintainers
# Format code with black
black src/ examples/ tests/
# Lint with flake8
flake8 src/ examples/ tests/
# Type checking with mypy
mypy src/src/
βββ config/ # Configuration management
βββ model/ # Model components
βββ training/ # Training pipelines
βββ data/ # Data handling
βββ utils/ # Utilities
tests/
βββ unit/ # Unit tests
βββ integration/ # Integration tests
βββ examples/ # Example tests
docs/ # Documentation
examples/ # Usage examples
- Use docstrings for all functions and classes
- Follow Google docstring format
- Include type hints
- Update README.md for significant changes
- Add examples for new features
type: brief description
Detailed description of changes
- Bullet point 1
- Bullet point 2
Closes: #issue-number
Types:
feat: New featuresfix: Bug fixesdocs: Documentation changesstyle: Code style changesrefactor: Code refactoringtest: Test additions/changeschore: Maintenance tasks
# Unit tests for individual components
def test_block_creation():
"""Test block creation with different configurations."""
pass
# Integration tests for component interactions
def test_stack_training_integration():
"""Test stack training with progressive curriculum."""
pass
# Example tests for end-to-end workflows
def test_gpt2_fusion_example():
"""Test GPT-2 fusion training example."""
pass- All new features must have tests
- Maintain >80% code coverage
- Include edge cases and error conditions
- Test both CPU and GPU paths when applicable
# Run all tests
python tests/run_tests.py
# Run specific test file
python -m pytest tests/unit/test_model.py
# Run with coverage
python -m pytest --cov=src tests/def train_stack_wise(
model: StackWiseModel,
dataloader: DataLoader,
curriculum: str = "left_to_right"
) -> TrainingResults:
"""Train a stack using the specified curriculum.
Args:
model: The StackWise model to train
dataloader: Data loader for training
curriculum: Training curriculum ("left_to_right" or "right_to_left")
Returns:
TrainingResults containing loss, metrics, and checkpoints
Raises:
ValueError: If curriculum is not supported
RuntimeError: If training fails
"""
pass- Update feature lists for new capabilities
- Add usage examples for new features
- Update performance characteristics
- Include configuration examples
docs/architecture.md: Architecture conceptsdocs/progressive_training.md: Training strategiesdocs/configuration_guide.md: Configuration referencedocs/api_reference.md: API documentation
- Check existing issues
- Verify bug with latest version
- Test with minimal reproduction case
**Bug Description**
Brief description of the bug
**Steps to Reproduce**
1. Step 1
2. Step 2
3. Step 3
**Expected Behavior**
What should happen
**Actual Behavior**
What actually happens
**Environment**
- OS: [e.g., Ubuntu 20.04]
- Python: [e.g., 3.9.7]
- PyTorch: [e.g., 2.1.0]
- CUDA: [e.g., 12.1]
**Additional Context**
Any other relevant information**Feature Description**
Brief description of the feature
**Use Case**
Why is this feature needed?
**Proposed Solution**
How should this feature work?
**Alternatives Considered**
Other approaches you've considered
**Additional Context**
Any other relevant information- Address all review comments
- Update tests if needed
- Ensure CI passes
- Respond to feedback promptly
- Check code quality and style
- Verify tests are adequate
- Ensure documentation is updated
- Test the changes locally
- Code follows style guidelines
- Tests are included and pass
- Documentation is updated
- No breaking changes (or properly documented)
- Performance impact is considered
- Memory usage is optimized
MAJOR.MINOR.PATCH(Semantic Versioning)MAJOR: Breaking changesMINOR: New features, backward compatiblePATCH: Bug fixes, backward compatible
- All tests pass
- Documentation is updated
- CHANGELOG.md is updated
- Version number is bumped
- Release notes are written
- Be respectful and inclusive
- Focus on constructive feedback
- Help others learn and grow
- Follow the golden rule
- Use clear, descriptive commit messages
- Provide context in PR descriptions
- Ask questions when unsure
- Share knowledge and best practices
- Documentation: docs/README.md
- Examples: examples/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Maintainers: @mlsquare
- Email: [Add contact email if available]
- Discord/Slack: [Add community link if available]
- Complete stack-wise training implementation
- Add more curriculum learning strategies
- Improve memory optimization
- Add comprehensive benchmarks
- Support for 100B+ models
- Multi-GPU training support
- Advanced attention mechanisms
- Production deployment tools
- Distributed training across nodes
- Model compression techniques
- Hardware-specific optimizations
- Commercial deployment support
Contributors will be recognized in:
CONTRIBUTORS.mdfile- Release notes
- Project documentation
- Community acknowledgments
Thank you for contributing to StackWise! Together, we can democratize large model training and make AI more accessible to everyone. π