Automated Labeling Process using a Human-in-the-Loop Framework with Artificial Intelligence
ALPHA is a novel software engineering framework that implements human-in-the-loop methodology through collaborative AI components for biological image analysis. The framework specifically integrates object detection models and validation filters to create robust automated labeling systems that significantly reduce the annotation burden on domain experts while maintaining high accuracy.
- Python 3.8+
- CUDA-capable GPU (recommended)
- Clone the repository
git clone https://github.com/your-username/alpha-framework.git cd alpha-framework - Install dependencies
chmod +x install_requirements.sh ./install_requirements.sh pip install -r requirements.txt
- Verify installation
python -c "import torch; from ultralytics import YOLO; print('✅ Installation successful!')"
dataset/
├── images/ # Your image files (.jpg, .jpeg, .png, .bmp)
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── labels/ # YOLO format annotation files (.txt)
├── image1.txt
├── image2.txt
└── ...
Each label file should contain bounding box annotations in YOLO format:
class_id center_x center_y width height
0 0.5 0.5 0.3 0.4
Where coordinates are normalized (0-1).
Run the entire ALPHA framework from start to finish:
python main.pyThis executes all four steps sequentially:
- Initial YOLO Training - Trains YOLO models with different data ratios
- First Inference + Manual Labeling - Runs inference and provides labeling interface
- Classification Training - Trains DenseNet classifier on labeled data
- Iterative Process - Performs active learning cycles
Execute specific steps individually:
python main.py --step 1- Trains YOLO models with different data percentages (10%, 20%, ..., 100%)
- Outputs trained models to
./results/01_initial_yolo/
python main.py --step 2- Runs inference on images using the best YOLO model
- Provides 4 labeling options:
- GUI Labeling (Recommended) - Interactive graphical interface
- CLI Labeling - Terminal-based labeling
- Batch Labeling - File-based labeling
- Auto Labeling - Confidence-based automatic labeling
python main.py --step 3- Trains DenseNet121 classifier on manually labeled data
- Uses different data ratios for robust training
python main.py --step 4- Runs iterative cycles combining YOLO detection and classification
- Performs active learning to improve model performance
Create and use custom configuration files:
python main.py --create-config my_config.jsonpython main.py --config my_config.json{
"dataset_root": "./dataset",
"images_dir": "./dataset/images",
"labels_dir": "./dataset/labels",
"yolo_epochs": 100,
"classification_epochs": 30,
"data_percentages": [10, 20, 50, 100],
"conf_threshold": 0.25,
"gpu_num": 0
}Override default settings with command line arguments:
# Specify custom directories
python main.py --images_dir /path/to/images --labels_dir /path/to/labels
# Use specific GPU
python main.py --gpu_num 1
# Set custom output directory
python main.py --output_dir ./my_results
# Combine multiple options
python main.py --step 1 --gpu_num 0 --config my_config.json- Interactive Interface: Point-and-click labeling with visual feedback
- Real-time Preview: See detection results immediately
- Easy Navigation: Browse through detected objects efficiently
- Requirements: GUI libraries (tkinter)
Ubuntu/Debian Setup:
sudo apt-get install python3-tkCentOS/RHEL Setup:
sudo yum install tkinter- Confidence-based: Automatically classifies based on detection confidence
- Threshold Control: Adjustable confidence threshold (0.3-0.9)
- Fast Processing: Suitable for large datasets
- Usage: Enter threshold when prompted (default: 0.6)
- Fallback Options: Available when GUI is not accessible
- Simplified Interface: Currently redirects to auto labeling
results/
├── 01_initial_yolo/ # Trained YOLO models
│ ├── yolov8_10.pt
│ ├── yolov8_20.pt
│ └── ...
├── 02_first_inference/ # Inference visualizations
├── 03_manual_labeling/ # Labeled training data
│ ├── class0/ # Objects to keep
│ └── class1/ # Objects to filter
├── 04_classification/ # Trained classifiers
│ ├── densenet121_10.pth
│ └── ...
└── 05_iterative_process/ # Final results
├── cycle_1/
├── cycle_2/
└── summary.json
The framework outputs detailed performance metrics including:
- F1-scores for each model and data ratio
- Precision and Recall values
- Cross-validation results
- Active learning cycle improvements
| Parameter | Description | Default | Range |
|---|---|---|---|
yolo_epochs |
YOLO training epochs | 100 | 50-300 |
classification_epochs |
Classifier training epochs | 30 | 10-100 |
conf_threshold |
Detection confidence threshold | 0.25 | 0.1-0.9 |
class_conf_threshold |
Classification confidence threshold | 0.5 | 0.1-0.9 |
max_cycles |
Maximum active learning cycles | 10 | 1-20 |
batch_size |
Training batch size | 16 | 8-64 |
- YOLO Training:
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100] - Classification:
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
# Reduce batch size
python main.py --config my_config.json # Edit batch_size in config# Install GUI libraries
sudo apt-get install python3-tk # Ubuntu/Debian
sudo yum install tkinter # CentOS/RHEL# Run previous steps first
python main.py --step 1 # Train YOLO models first
python main.py --step 2 # Then run labeling- Ensure both
class0/andclass1/directories contain images - Try auto labeling with different confidence thresholds
- Use GUI labeling for better control
# Check GPU usage
nvidia-smi
# Use specific GPU
python main.py --gpu_num 1- Reduce
batch_sizeif encountering OOM errors - Use smaller
img_sizefor YOLO training - Close other GPU-intensive applications
- Human-in-the-Loop Design: Seamlessly integrates human expertise with AI automation
- Dual AI Components: Combines YOLO object detection with DenseNet classification for robust performance
- Noise Reduction: Advanced validation filters reduce annotation errors by 83%
- Data Efficiency: Achieves near-optimal performance using only 10% of original labeled data
- Cross-Domain Generalization: Robust performance across different biological datasets
- Modular Architecture: Easy to extend and customize for various biological applications
- F1-scores: 0.89-0.95 on blood smear datasets with minimal data
- Cross-domain F1-scores: 0.88-0.97 across different domains
- Error Reduction: 83% reduction in intentional annotation errors
- Data Requirement: Only 10% of original labeled data needed
If you have any questions or provide your cell images, please contact us by email(kc.jeong-isw@chungbuk.ac.kr, gc.jo-isw@chungbuk.ac.kr).

