Confidence-Aware Routing for Large Language Model Reliability Enhancement
A Python package implementing a multi-signal approach to pre-generation hallucination mitigation for Large Language Models. HalluNox combines semantic alignment measurement, internal convergence analysis, and learned confidence estimation to produce unified confidence scores for proactive routing decisions.
- 🎯 Pre-generation Hallucination Detection: Assess model reliability before generation begins
- 🔄 Confidence-Aware Routing: Automatically route queries based on estimated confidence
- 🧠 Multi-Signal Approach: Combines semantic alignment, internal convergence, and learned confidence
- ⚡ Optimized for Llama Models: Default support for Llama-3.2-3B-Instruct architecture
- 📊 Comprehensive Evaluation: Built-in metrics and routing strategy analysis
- 🚀 Easy Integration: Simple API for both training and inference
Based on the research paper "Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation" by Nandakishor M (Convai Innovations).
The approach implements deterministic routing to appropriate response pathways:
- High Confidence (≥0.8): Local generation
- Medium Confidence (0.6-0.8): Retrieval-augmented generation
- Low Confidence (0.4-0.6): Route to larger models
- Very Low Confidence (<0.4): Human review required
- Python 3.8+
- PyTorch 1.13+
- CUDA-compatible GPU (recommended)
- At least 8GB GPU memory for training
pip install hallunoxgit clone https://github.com/convai-innovations/hallunox.git
cd hallunox
pip install -e .HalluNox automatically installs the following dependencies:
torch>=1.13.0- PyTorch frameworktransformers>=4.21.0- Hugging Face TransformersFlagEmbedding>=1.2.0- BGE-M3 embedding modeldatasets>=2.0.0- Dataset loading utilitiesscikit-learn>=1.0.0- Evaluation metricsnumpy>=1.21.0- Numerical computationstqdm>=4.64.0- Progress bars
from hallunox import HallucinationDetector
# Initialize detector (downloads pre-trained model automatically)
detector = HallucinationDetector()
# Analyze text for hallucination risk
results = detector.predict([
"The capital of France is Paris.", # High confidence
"Your password is 12345678.", # Low confidence
"The Moon is made of cheese." # Very low confidence
])
# View results
for pred in results["predictions"]:
print(f"Text: {pred['text']}")
print(f"Confidence: {pred['confidence_score']:.3f}")
print(f"Risk Level: {pred['risk_level']}")
print(f"Routing Action: {pred['routing_action']}")
print()hallunox-infer --interactivehallunox-infer --input_file texts.txt --output_file results.jsonhallunox-infer --demo --show_routingfrom hallunox import Trainer, TrainingConfig
# Configure training
config = TrainingConfig(
batch_size=8,
learning_rate=5e-4,
max_epochs=6,
output_dir="./models/my_hallucination_model"
)
# Train model
trainer = Trainer(config)
trainer.train()Or using the command line:
hallunox-train --batch_size 8 --learning_rate 5e-4 --max_epochs 6HalluNox uses a hybrid architecture combining:
-
LLM Component: Llama-3.2-3B-Instruct (default)
- Extracts internal hidden representations
- Supports any Llama-architecture model
-
Embedding Model: BGE-M3 (fixed)
- Provides reference semantic embeddings
- 1024-dimensional dense vectors
-
Projection Network:
- Maps LLM hidden states (3072D) to embedding space (1024D)
- 3-layer MLP with ReLU activations and dropout
from hallunox import HallucinationDetector
detector = HallucinationDetector(
model_path="path/to/trained/model.pt", # Optional: uses pre-trained if None
llm_model_id="unsloth/Llama-3.2-3B-Instruct", # Any Llama model
embed_model_id="BAAI/bge-m3", # Fixed embedding model
device="cuda", # cuda or cpu
max_length=512, # LLM sequence length
bge_max_length=512, # BGE-M3 sequence length
use_fp16=True # Mixed precision
)from hallunox import TrainingConfig
config = TrainingConfig(
# Model settings
model_id="unsloth/Llama-3.2-3B-Instruct",
embed_model_id="BAAI/bge-m3",
# Training hyperparameters
batch_size=8,
learning_rate=5e-4,
max_epochs=6,
warmup_steps=300,
# Dataset configuration
use_truthfulqa=True,
use_halueval=True,
use_fever=True,
max_samples_per_dataset=3000,
# Confidence thresholds
high_confidence_threshold=0.9,
medium_confidence_threshold=0.7,
low_confidence_threshold=0.3,
)A pre-trained model is available for immediate use:
from hallunox.utils import download_model
# Automatically downloads from https://storage.googleapis.com/courseai/best_model_hl.pt
model_path = download_model()The model was trained on a combination of:
- TruthfulQA
- HaluEval
- FEVER
- XSum Factuality
- SQuAD v2
- Natural Questions
- Synthetic examples
predict(texts): Analyze texts for hallucination confidencebatch_predict(texts, batch_size=16): Process large batches efficientlyevaluate_routing_strategy(texts): Analyze routing decisions
{
"predictions": [
{
"text": "input text",
"confidence_score": 0.85,
"similarity_score": 0.92,
"interpretation": "HIGH_CONFIDENCE",
"risk_level": "LOW_RISK",
"routing_action": "LOCAL_GENERATION",
"description": "This response appears to be factual and reliable."
}
],
"summary": {
"total_texts": 1,
"avg_confidence": 0.85,
"high_confidence_count": 1,
"medium_confidence_count": 0,
"low_confidence_count": 0,
"very_low_confidence_count": 0
}
}TrainingConfig: Configuration dataclass for training parametersTrainer: Main training class with dataset loading and model trainingMultiDatasetLoader: Loads and combines multiple hallucination detection datasets
download_model(): Download pre-trained modelsetup_logging(): Configure loggingcheck_gpu_availability(): Check CUDA compatibilityvalidate_model_requirements(): Verify dependencies
Our confidence-aware routing system demonstrates:
- 74% hallucination detection rate (vs 42% baseline)
- 9% false positive rate (vs 15% baseline)
- 40% reduction in computational cost vs post-hoc methods
- 1.6x cost multiplier vs always using expensive operations (4.2x)
- CPU: Modern multi-core processor
- RAM: 12GB system memory minimum
- GPU: 8GB VRAM minimum (NVIDIA RTX 3070, RTX 4060 Ti, or better)
- Storage: 10GB free disk space for models
- OS: Python 3.8+ compatible operating system
- CPU: Intel i7/AMD Ryzen 7 or better
- RAM: 16GB+ system memory
- GPU: NVIDIA GPU with 12GB+ VRAM (RTX 4070, RTX 3080, or better)
- Storage: NVMe SSD with 15GB+ free space
- CUDA: 11.8+ compatible GPU driver
- CPU: High-performance multi-core processor (Intel i9/AMD Ryzen 9)
- RAM: 32GB+ system memory (64GB recommended)
- GPU: NVIDIA GPU with 24GB+ VRAM (RTX 4090, A100, H100, or better)
- Storage: 150GB+ free disk space (NVMe SSD strongly recommended)
- Model checkpoints: ~5GB per epoch
- Training datasets: ~20GB
- Intermediate outputs: ~50GB
- Logs and metrics: ~10GB
- Network: High-speed internet for dataset downloads
- RAM: 16GB minimum (32GB recommended)
- Storage: 15GB free disk space
- Note: CPU inference is 10-50x slower than GPU inference
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
If you use HalluNox in your research, please cite:
@article{nandakishor2024hallunox,
title={Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation},
author={Nandakishor M},
journal={AI Safety Research},
year={2024},
organization={Convai Innovations}
}We welcome contributions! Please see our contributing guidelines and submit pull requests to our repository.
For technical support and questions:
- Email: support@convaiinnovations.com
- Issues: GitHub Issues
Nandakishor M
AI Safety Research
Convai Innovations Pvt. Ltd.
Email: support@convaiinnovations.com