Unified Reliability Engine for AI/ML Systems
SentinelML is a comprehensive framework for monitoring, evaluating, and ensuring the reliability of machine learning systems across traditional ML, deep learning, generative AI, RAG pipelines, and agentic systems.
- Traditional ML: Drift detection, anomaly detection, out-of-distribution detection
- Deep Learning: Uncertainty quantification, adversarial detection, feature drift monitoring
- Generative AI: Input/output guardrails, hallucination detection, bias detection
- RAG Systems: Retrieval relevance, faithfulness checking, end-to-end evaluation (RAGAS, ARES)
- Agent Systems: Trajectory validation, tool monitoring, reasoning consistency
| Capability | Description |
|---|---|
| π Drift Detection | KS-test, PSI, MMD, Adversarial drift detectors |
| π‘οΈ Trust Scoring | Mahalanobis distance, Isolation Forest, VAE-based anomaly detection |
| π― Uncertainty Quantification | MC Dropout, Deep Ensembles, Evidential Networks, Temperature Scaling |
| π Guardrails | Prompt injection detection, PII filtering, toxicity detection, schema validation |
| π Visualization | Trust dashboards, drift plots, interactive Plotly dashboards |
| π₯οΈ Serving | FastAPI and gRPC servers for production monitoring |
# Basic installation (Traditional ML only)
pip install sentinelml
# With PyTorch support
pip install sentinelml[torch]
# With TensorFlow support
pip install sentinelml[tensorflow]
# For Generative AI / LLM applications
pip install sentinelml[genai]
# For RAG applications
pip install sentinelml[rag]
# For production serving
pip install sentinelml[serving]
# Complete installation
pip install sentinelml[all]
# Development installation
pip install sentinelml[dev]import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sentinelml import Sentinel, KSDriftDetector, MahalanobisTrust
# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test = X[:100], X[100:]
# Train your model
model = RandomForestClassifier().fit(X_train, y[:100])
# Initialize Sentinel with drift and trust monitoring
sentinel = Sentinel(
drift_detector=KSDriftDetector(threshold=0.05),
trust_model=MahalanobisTrust(),
verbose=True
)
# Fit on reference (training) data
sentinel.fit(X_train)
# Assess new samples
results = []
for x in X_test:
result = sentinel.assess(x)
results.append(result)
print(f"Trust: {result.trust_score:.3f}, Drift: {result.has_drift}")
# Visualize
from sentinelml.viz import plot_trust
trust_scores = [r.trust_score for r in results]
plot_trust(trust_scores, title="Trust Scores on Test Data")import numpy as np
# Simulate drifted data
drift_data = X_test + np.random.normal(0, 2, X_test.shape)
# Assess drifted samples
for x in drift_data[:5]:
result = sentinel.assess(x)
print(f"Trust: {result.trust_score:.3f}, "
f"Drift p-value: {result.drift_pvalue:.4f}, "
f"Is Trustworthy: {result.is_trustworthy}")from sentinelml import PromptInjectionDetector, HallucinationDetector
# Input validation
injection_detector = PromptInjectionDetector(threshold=0.7)
result = injection_detector.detect("Ignore previous instructions and...")
print(f"Injection detected: {result.is_violation}, Score: {result.score}")
# Output validation (RAG context)
hallucination_detector = HallucinationDetector(method="self_consistency")
context = ["Paris is the capital of France.", "France is in Europe."]
generated = "Paris is the capital of Germany."
result = hallucination_detector.verify(context, generated)
print(f"Hallucination detected: {result.is_hallucination}")from sentinelml import RAGASEvaluator, FaithfulnessChecker
# End-to-end RAG evaluation
evaluator = RAGASEvaluator(metrics=["faithfulness", "answer_relevancy", "context_recall"])
results = evaluator.evaluate(
questions=["What is the capital of France?"],
answers=["Paris is the capital of France."],
contexts=[["Paris is the capital of France."]],
ground_truths=["Paris"]
)
# Component-level checking
faithfulness = FaithfulnessChecker()
score = faithfulness.check(answer="Paris is the capital.", context="Paris is France's capital city.")from sentinelml import StepValidator, LoopDetector, BudgetManager
# Monitor agent execution
validator = StepValidator()
loop_detector = LoopDetector(window_size=5)
budget = BudgetManager(max_steps=50, max_tokens=10000)
# Validate each step
for step_num, (thought, action, observation) in enumerate(agent_steps):
validation = validator.validate_step(thought, action, observation)
if loop_detector.detect_loop(agent_steps[:step_num+1]):
print("Loop detected! Breaking...")
break
if not budget.consume_step(tokens_used=len(thought.split())):
print("Budget exceeded!")
break# Scan dataset for drift and anomalies
sentinelml scan data.csv --drift-detector mmd --trust-model mahalanobis --output report.json
# Evaluate model reliability
sentinelml evaluate model.pkl test.csv --labels target --output evaluation.json
# Start monitoring server
sentinelml serve --port 8000 --config sentinel.yaml
# Generate configuration template
sentinelml config --type genai --output sentinel.yamlsentinelml/
βββ core/ # Core engine and orchestration
β βββ sentinel.py # Main Sentinel orchestrator
β βββ pipeline.py # Processing pipelines
β βββ ensemble.py # Adaptive trust ensembles
β βββ report.py # Reporting infrastructure
βββ traditional/ # Traditional ML monitoring
β βββ drift/ # Drift detection methods
β βββ trust/ # Anomaly/trust scoring
β βββ familiarity/ # OOD detection
βββ deep_learning/ # Deep learning specific
β βββ uncertainty/ # UQ methods (MC Dropout, Ensembles, etc.)
β βββ feature_drift/ # Activation/embedding monitoring
β βββ adversarial/ # Adversarial attack detection
βββ genai/ # Generative AI guardrails
β βββ guardrails/ # Input/output validation
β βββ alignment/ # Bias and toxicity detection
β βββ uncertainty/ # LLM uncertainty estimation
βββ rag/ # RAG pipeline evaluation
β βββ retrieval/ # Retrieval metrics
β βββ generation/ # Generation quality
β βββ advanced/ # Claim verification, contradiction detection
β βββ end_to_end/ # RAGAS, ARES evaluators
βββ agents/ # Agent system monitoring
β βββ trajectory/ # Step validation, loop detection
β βββ reasoning/ # Logic checking, consistency
β βββ state/ # Budget and checkpoint management
βββ adapters/ # Framework integrations
β βββ sklearn_adapter.py
β βββ torch_adapter.py
β βββ tensorflow_adapter.py
β βββ openai_adapter.py
β βββ langchain_adapter.py
β βββ ...
βββ infrastructure/ # Production infrastructure
β βββ serving/ # FastAPI/gRPC servers
β βββ storage/ # Vector store integration
β βββ streaming/ # Kafka consumers
βββ viz.py # Visualization utilities
Create a configuration file for different deployment scenarios:
# sentinel.yaml - Traditional ML
sentinel:
drift_detector:
type: mmd
threshold: 0.05
trust_model:
type: mahalanobis
calibration: isotonic
monitoring:
batch_size: 1000
check_interval: 3600# sentinel.yaml - GenAI
sentinel:
guardrails:
input:
- type: prompt_injection
threshold: 0.7
- type: pii_detection
entities: [email, phone, ssn]
output:
- type: hallucination_detection
method: self_consistency
llm:
model: gpt-4
temperature: 0.7SentinelML includes comprehensive benchmarking tools to compare against baseline methods:
from sentinelml.benchmarks import BenchmarkComparison
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
# Compare Sentinel against baselines
benchmark = BenchmarkComparison(sentinel=sentinel, model=model)
results = benchmark.evaluate(X_test, y_test)
# Returns comparison of:
# - sentinel: Trust scores from SentinelML
# - entropy: Prediction entropy (uncertainty)
# - isolation_forest: Isolation Forest anomaly scores
# - lof: Local Outlier Factor scores- β Modular architecture rewrite
- β GenAI guardrails (input/output)
- β RAG evaluation framework
- β Agent monitoring tools
- β FastAPI/gRPC serving
- Streaming drift detection (Kafka integration)
- Distributed monitoring (Ray/Spark)
- Advanced attribution methods
- Automated threshold tuning
- Multi-modal support (vision, audio)
- Real-time adversarial defense
- LLM-powered root cause analysis
- Enterprise dashboard
Contributions are welcome! Please see our Contributing Guide.
# Development setup
git clone https://github.com/sentinelml/sentinelml.git
cd sentinelml
pip install -e ".[dev]"
# Run tests
pytest tests/ --cov=sentinelml
# Code quality
black sentinelml/ tests/
isort sentinelml/ tests/
flake8 sentinelml/ tests/SentinelML integrates research from:
- Out-of-Distribution Detection: Hendrycks & Gimpel, Liu et al.
- Drift Detection: Rabanser et al. (MMD), dos Reis et al. (PSI)
- Uncertainty Quantification: Gal & Ghahramani (MC Dropout), Lakshminarayanan et al. (Deep Ensembles)
- LLM Safety: Perez & Ribeiro (red teaming), Minding the Gap (hallucination detection)
- RAG Evaluation: Es et al. (RAGAS), Saad-Falcon et al. (ARES)
If you use SentinelML in your research:
@software{sentinelml2024,
title={SentinelML: Unified Reliability Engine for AI/ML Systems},
author={SentinelML Team},
year={2024},
version={2.0.0},
url={https://github.com/sentinelml/sentinelml}
}MIT License - see LICENSE file.
- PyPI: https://pypi.org/project/sentinelml/
- Documentation: https://sentinelml.readthedocs.io/
- GitHub: https://github.com/sentinelml/sentinelml
- Issues: https://github.com/sentinelml/sentinelml/issues
For questions and support:
- π§ Email: team@sentinelml.ai
- π¬ Discussions: GitHub Discussions
- π Issues: GitHub Issues
SentinelML: Trustworthy AI through continuous monitoring