A comprehensive implementation of privacy-preserving machine learning using Fully Homomorphic Encryption (FHE). This project demonstrates how to train and perform inference on neural networks while keeping data encrypted throughout the entire process.
FHE-ML provides a complete framework for:
- Training neural networks on encrypted data using hybrid approaches
- Performing encrypted inference on trained models
- Maintaining data privacy during both training and inference phases
- Working with real datasets (MNIST) under homomorphic encryption constraints
The implementation uses the TenSEAL library for FHE operations and PyTorch for neural network components.
- Complete data encryption during training and inference
- No plaintext data exposure during computation
- Secure model parameter updates through encrypted gradients
- Custom MLP implementation designed for homomorphic encryption
- FHE-friendly polynomial activation functions
- Linear activations for minimal multiplicative depth
- Configurable architecture with multiple hidden layers
- Combines encrypted and plaintext training for optimal performance
- Proxy model synchronization for gradient computation
- Flexible training strategies (encrypted-only, plaintext-only, or mixed)
- Encrypted inference with confidence scoring
- Model accuracy evaluation on encrypted test sets
- Confusion matrix generation for detailed analysis
-
model.py- Neural network implementationsFHEMLPClassifier: Main FHE-compatible neural networkFHELinearLayer: Custom linear layers for encrypted computationFHEPolynomialActivation: Polynomial approximation of ReLUTorchMLPClassifier: Standard PyTorch model for proxy training
-
training.py- Training algorithmsFHETrainer: Base trainer for FHE modelsHybridFHETrainer: Advanced trainer supporting mixed encrypted/plaintext batches- Gradient approximation techniques for encrypted data
-
inference.py- Encrypted inference engineFHEInference: Core inference functionalitySecureInferenceServer: Production-ready inference server- Batch processing and confidence estimation
-
utils.py- Utility functions- FHE context creation and management
- Data encryption/decryption operations
- MNIST data loading and preprocessing
FHEDataset: Custom dataset class for encrypted data
-
main.py- Training orchestration- Complete training pipeline with progress tracking
- Flexible command-line interface
- Model checkpointing and evaluation
- Python 3.10 (required for TenSEAL compatibility)
- PDM package manager
# Clone the repository
git clone <repository-url>
cd fheml
# Install dependencies using PDM
pdm install
# Activate the virtual environment
pdm shell- TenSEAL (β₯0.3.16): Homomorphic encryption library
- PyTorch (β₯2.8.0): Deep learning framework
- TorchVision (β₯0.23.0): Computer vision utilities
- NumPy (β₯2.2.6): Numerical computing
- Scikit-learn (β₯1.7.1): Machine learning utilities
- Matplotlib (β₯3.10.5): Plotting and visualization
- tqdm (β₯4.67.1): Progress bars
pdm run train --epochs 3 --hidden-dims 64 --learning-rate 0.01pdm run train --use-encrypted --encrypted-epochs 2 --encrypted-samples 100 --save-modelpdm run train --use-encrypted --encrypted-samples 200 --test-encrypted-inference --save-model| Option | Description | Default |
|---|---|---|
--epochs |
Number of training epochs | 5 |
--batch-size |
Batch size for plaintext training | 64 |
--encrypted-batch-size |
Batch size for encrypted training | 4 |
--learning-rate |
Learning rate | 0.01 |
--hidden-dims |
Hidden layer dimensions | [64] |
--use-encrypted |
Enable encrypted training | False |
--encrypted-epochs |
Epochs of encrypted training | 1 |
--encrypted-samples |
Number of encrypted training samples | 100 |
--encrypted-test-samples |
Number of encrypted test samples | 20 |
--poly-modulus-degree |
FHE polynomial modulus degree | 8192 |
--scale-bits |
FHE scale bits | 40 |
--save-model |
Save trained model | False |
--test-encrypted-inference |
Test encrypted inference | False |
from model import FHEMLPClassifier
from training import HybridFHETrainer
from utils import create_context, load_mnist_data
# Create FHE context
context = create_context(poly_modulus_degree=8192, scale_bits=40)
# Create model
model = FHEMLPClassifier(
input_dim=784,
hidden_dims=[64],
num_classes=10,
use_polynomial_activation=False
)
# Setup training
trainer = HybridFHETrainer(model, learning_rate=0.01)
train_loader = load_mnist_data(batch_size=64, train=True)
# Train on plaintext data
for images, labels in train_loader:
loss = trainer.train_on_plain_batch(images, labels)from inference import FHEInference
from utils import encrypt_tensor
# Setup inference engine
inference_engine = FHEInference(model, context)
# Encrypt input data
sample_image = torch.randn(784)
encrypted_input = encrypt_tensor(context, sample_image)
# Perform encrypted inference
prediction = inference_engine.predict_encrypted(encrypted_input)
prediction_with_confidence = inference_engine.predict_with_confidence(encrypted_input)The project uses the CKKS scheme from TenSEAL, which supports approximate arithmetic on encrypted real numbers. Key technical considerations:
-
Multiplicative Depth: Limited by the encryption parameters
- Polynomial modulus degree: 8192 (default)
- Coefficient modulus: [60, 40, 40, 60] bits
- Scale: 2^40
-
Activation Functions:
- Linear activation (identity) for minimal depth
- Polynomial approximation:
f(x) β 0.5x + 0.25xΒ²for ReLU-like behavior
-
Noise Management:
- Automatic rescaling after multiplications
- Bootstrap operations when needed
- Careful parameter selection for noise budget
The hybrid training approach combines:
- Plaintext Training: Fast convergence on unencrypted data
- Encrypted Training: Privacy-preserving fine-tuning
- Proxy Model: Standard PyTorch model for gradient computation
- Weight Synchronization: Keeps FHE and proxy models aligned
- Encrypted operations are ~1000x slower than plaintext
- Small batch sizes recommended for encrypted training (2-8 samples)
- Limited model complexity due to multiplicative depth constraints
- Preprocessing data encryption can be done offline
Run the test suite to verify functionality:
# Basic functionality test
pdm run test-basic
# Full test suite
pdm run testThe basic test covers:
- FHE context creation
- Model instantiation
- Encryption/decryption operations
- Forward pass on encrypted data
- MNIST data loading
- End-to-end inference
fheml/
βββ main.py # Main training script
βββ model.py # Neural network implementations
βββ training.py # Training algorithms
βββ inference.py # Inference engines
βββ utils.py # Utility functions
βββ tests/ # Test suite
β βββ test_basic.py # Basic functionality tests
β βββ test_inference.py # Inference tests
β βββ test_model.py # Model tests
β βββ test_training.py # Training tests
β βββ test_utils.py # Utility tests
βββ data/ # Dataset storage
β βββ MNIST/ # MNIST dataset
βββ pyproject.toml # Project configuration
βββ pdm.lock # Dependency lock file
βββ README.md # This file
- Performance: Encrypted operations are computationally expensive
- Model Size: Limited by FHE multiplicative depth constraints
- Batch Size: Small encrypted batch sizes for practical training times
- Activation Functions: Restricted to low-degree polynomials
- Support for convolutional layers
- Advanced bootstrapping techniques
- Distributed encrypted training
- Integration with federated learning
- Support for other FHE schemes (BFV, BGV)
Contributions are welcome! Please ensure:
- All tests pass:
python -m pytest tests/ - Code follows the existing style
- New features include appropriate tests
- Security considerations are documented
This project is licensed under the MIT License. See the LICENSE file for details.
- TenSEAL: A Library for Encrypted Tensor Operations Using Homomorphic Encryption
- CKKS Scheme: Homomorphic Encryption for Approximate Arithmetic
- Privacy-Preserving Machine Learning: Techniques and Applications
This implementation is for research and educational purposes. For production use:
- Conduct thorough security audits
- Use appropriate key management
- Consider side-channel attack mitigation
- Validate against specific threat models