Federated Learning Lab

TSU NSF AI Workshop 2026 — Session 4, Day 2 Afternoon (1:30 PM – 4:30 PM) Instructor: Dr. Yuankai Huo

What is Federated Learning?

Imagine three hospitals want to build an AI model to detect skin cancer. The problem: patient data is private and cannot be shared between hospitals.

Traditional (Centralized) AI: All hospitals send their patient images to one central server → train one model. This works well, but it violates patient privacy.

Federated Learning: Each hospital trains the model locally on their own data. Only the model weights (not the images) are sent to a central server. The server averages the weights and sends the improved model back. Patient data never leaves the hospital.

Hospital A  ──┐
              ├──► Server (averages weights) ──► Better Global Model
Hospital B  ──┘

This lab simulates this process using DermaMNIST — a dataset of 7,007 skin lesion images across 7 categories — split between 2 simulated hospitals.

Dataset: DermaMNIST

Property	Value
Source	MedMNIST (dermatology)
Image size	28 × 28 pixels, color (RGB)
Training images	7,007
Validation images	1,003
Test images	2,005
Classes	7 skin lesion types

The 7 classes:

0 — Actinic keratoses
1 — Basal cell carcinoma
2 — Benign keratosis-like lesions
3 — Dermatofibroma
4 — Melanoma
5 — Melanocytic nevi (most common — 67% of training data)
6 — Vascular lesions

Quick Start

1. Install dependencies

pip install torch numpy matplotlib

For Step 6 (privacy extensions), also install:

pip install tenseal

2. Run the scripts in order

python 1_centralized_baseline.py
python 2_data_partitioning.py
python 3_federated_training.py
python 4_evaluation_visualization.py
python 5_practice.py
python 6_privacy_extension.py

Each script prints step-by-step output explaining what is happening. Read the printed output as you go — it is part of the lesson.

3. Prefer Jupyter / Google Colab?

Open the matching notebook in the CoLab/ folder. Every script has an equivalent notebook with the same code and richer markdown explanations.

Lab Steps

Step 1 — Centralized Baseline

File: 1_centralized_baseline.py

Goal: Train a single model on all the data the traditional way.

This is our benchmark. In a real hospital scenario this would require collecting all patient images on one server — a major privacy risk. Here we do it just to measure how well a model can perform with full data access.

What the script does:

Loads all 7,007 DermaMNIST training images
Builds a simple CNN (2 conv layers + 2 fully connected layers)
Trains for 20 epochs, printing train and validation accuracy each epoch
Evaluates on the held-out test set
Saves results to results/centralized_results.json

What to look for while it runs:

Accuracy starts low (random ~14%) and rises each epoch
Training accuracy is usually higher than validation — this is normal (the model has seen training data)
The final test accuracy is our gold standard for comparison

Key concept — CNN: A Convolutional Neural Network scans images with small filters to detect edges, textures, and shapes. Each layer learns increasingly complex patterns. The final layer outputs one score per class — the highest score is the prediction.

Output files:

results/centralized_results.json

Step 2 — Data Partitioning

File: 2_data_partitioning.py

Goal: Split the training data across 2 simulated community health centers (non-IID).

In federated learning, data lives on separate machines and is never pooled. This step simulates that by dividing the dataset into two realistic "hospital" slices.

What is Non-IID?

Term	Meaning
IID	Every client has the same balanced mix of all classes
Non-IID	Each client has a different disease distribution — realistic!

How the split works:

Client 1 (Health Center A) gets 80% of the rare lesion types (classes 0–3)
Client 2 (Health Center B) gets 80% of the common lesion types (classes 4–6)
This simulates a specialist center (Client 1) vs. a general clinic (Client 2)

What to look for while it runs:

The printed table and bar chart show how lopsided the distributions are
Notice that Client 1 has very few samples of class 5 (melanocytic nevi) while Client 2 has thousands — they are seeing completely different patients

Key concept — Non-IID challenge: When clients have different class distributions, federated learning is harder. Each client's model pulls in a different direction, making the global average less accurate. This is the central open problem in FL research.

Output files:

data/client1.npz
data/client2.npz
data/test_data.npz

Step 3 — Federated Training

File: 3_federated_training.py

Goal: Train a global model across 2 clients using the FedAvg algorithm — without any client sharing its raw data.

How each communication round works:

┌─────────────────────────────────────────────────────┐
│  1. SERVER  → sends current global model to clients │
│  2. CLIENT 1 → trains on its own data locally       │
│     CLIENT 2 → trains on its own data locally       │
│  3. CLIENTS  → send updated weights back to server  │
│     (weights only — no patient images!)             │
│  4. SERVER  → averages all weights (FedAvg)         │
│  5. Repeat for next round                           │
└─────────────────────────────────────────────────────┘

Default settings (change at top of file):

NUM_ROUNDS = 5 — how many server↔client communication cycles
LOCAL_EPOCHS = 2 — how many epochs each client trains per round

What to look for while it runs:

Global model accuracy after each round — it should increase round by round
The server never prints any patient data — only weight values and accuracy
Compare the final accuracy to Step 1's centralized result

Key concept — FedAvg: Federated Averaging (McMahan et al., 2017) is the foundational FL algorithm. The server computes a weighted average of client weights, where each client's contribution is proportional to its number of training samples.

Output files:

results/federated_results.json

Step 4 — Evaluation & Visualization

File: 4_evaluation_visualization.py

Goal: Compare centralized vs. federated results side by side with charts.

Requires: Run Steps 1 and 3 first so the result JSON files exist.

What the script produces:

Chart	What it shows
`accuracy_comparison.png`	Centralized (per epoch) vs. federated (per round) accuracy curves
`final_accuracy_bar.png`	Side-by-side bar chart of final test accuracy
`convergence_vs_cost.png`	How accuracy improves as communication cost accumulates

What to look for:

The accuracy gap between centralized and federated — how large is it?
Does federated accuracy improve smoothly round by round, or does it jump?
The communication cost chart shows a classic FL tradeoff: more rounds = better accuracy, but higher network cost

Communication cost explained:

Each round: server → clients (download) + clients → server (upload)
2 clients × 2 directions × ~2 MB model = ~8 MB per round
Patient images (~167 MB) are never transmitted — that is the privacy gain

Output files:

results/plots/accuracy_comparison.png
results/plots/final_accuracy_bar.png
results/plots/convergence_vs_cost.png

Step 5 — Practice

File: 5_practice.py

Goal: Experiment with FL parameters and observe how they change outcomes.

This script is standalone — it reloads raw data and runs FL from scratch using whatever parameters you set. No need to re-run earlier steps.

Find these four lines at the top of the file and change them:

NUM_ROUNDS     = 5      # Try: 1, 3, 5, 10
LOCAL_EPOCHS   = 2      # Try: 1, 2, 5, 10
LEARNING_RATE  = 0.001  # Try: 0.01, 0.001, 0.0001
NON_IID_DEGREE = 0.8    # Try: 0.5 (balanced) → 0.9 (very skewed)

Suggested experiments:

Experiment	Change	Expected observation
More rounds	`NUM_ROUNDS = 10`	Higher accuracy, but diminishing returns
Client drift	`LOCAL_EPOCHS = 10`	Accuracy may drop — clients diverge too far
IID data	`NON_IID_DEGREE = 0.5`	Accuracy gap vs. centralized nearly disappears
Very non-IID	`NON_IID_DEGREE = 0.9`	Larger accuracy gap — harder for FL
Fast learning	`LEARNING_RATE = 0.01`	Training is faster but may be unstable

Discussion questions printed at the end of the script:

What happens to accuracy as you increase NUM_ROUNDS?
What happens when LOCAL_EPOCHS is very large? (client drift)
Compare NON_IID_DEGREE = 0.5 vs 0.9. Why does skew hurt?
With only 20 MB total budget, what is your best configuration?
What privacy risks remain even in federated learning?

Step 6 — Privacy Extensions

File: 6_privacy_extension.py

Goal: Go beyond basic FL privacy with two advanced cryptographic techniques.

Prerequisite: Run Step 2 first (needs client data files). For real HE (not simulation), install TenSEAL first:

pip install tenseal

Part A — Homomorphic Encryption (HE)

Standard FL protects raw data, but the server still sees each client's plaintext weights. A sophisticated attacker could try to reverse-engineer training images from those weights (gradient inversion attack).

Homomorphic Encryption (HE) solves this: the server aggregates weights without ever decrypting them.

Client 1: weights  →  Encrypt  →  Enc(w1)  ──┐
                                               ├─► Server computes:
Client 2: weights  →  Encrypt  →  Enc(w2)  ──┘   Enc(avg) = Enc(w1)×f1 + Enc(w2)×f2
                                                        ↓
                                               Client decrypts → avg  (same result!)

What to look for:

The script prints how long encryption, aggregation, and decryption take
Compare HE timing to standard FL — this is the real cost of stronger privacy
The verification step confirms HE gives the same answer as plaintext averaging
If TenSEAL is not installed, the script runs a conceptual simulation instead

Key concept — CKKS scheme: CKKS is a homomorphic encryption scheme designed for floating-point arithmetic. It supports approximate computation — tiny numerical errors (~1e-5) that are completely negligible for neural network weight averaging.

Part B — Secure Aggregation (Additive Masking)

A faster alternative to HE. Each client adds a random mask to their weights before sending. The masks are designed to cancel when the server sums them.

Client 1 sends:  w1 + mask          ← looks like noise
Client 2 sends:  w2 - mask          ← looks like noise
Server computes: (w1+mask) + (w2-mask) = w1 + w2  ← masks cancel!

The server learns only the sum — it cannot recover w1 or w2 individually.

What to look for:

The printed output shows what the server actually sees for one weight value
The "true" weight vs. the "sent" value look completely different
Accuracy of secure aggregation vs. standard FL should be nearly identical
This technique is already used in production at Google (Gboard keyboard)

Part C — Comparison Summary

The script prints a table comparing all approaches:

Technique	Server sees	Accuracy loss	Speed
Standard FL	Plaintext weights	None	Fast
Secure Aggregation	Only the SUM	None	Fast
Homomorphic Encryption	Encrypted ciphertext	None	Slow (10–100×)

Output files:

results/privacy_results.json
results/plots/privacy_comparison.png

Project Structure

FederatedLearning/
├── data/
│   ├── dermamnist.npz             ← original dataset (you provide this)
│   ├── client1.npz                ← created by Step 2
│   ├── client2.npz                ← created by Step 2
│   └── test_data.npz              ← created by Step 2
│
├── results/
│   ├── centralized_results.json   ← created by Step 1
│   ├── federated_results.json     ← created by Step 3
│   ├── privacy_results.json       ← created by Step 6
│   └── plots/
│       ├── accuracy_comparison.png
│       ├── final_accuracy_bar.png
│       ├── convergence_vs_cost.png
│       └── privacy_comparison.png
│
├── 1_centralized_baseline.py      ← train on all data (benchmark)
├── 2_data_partitioning.py         ← split data across 2 clients (non-IID)
├── 3_federated_training.py        ← FedAvg across 2 clients
├── 4_evaluation_visualization.py  ← compare & plot results
├── 5_practice.py                  ← experiment with parameters
├── 6_privacy_extension.py         ← homomorphic encryption + secure aggregation
│
└── CoLab/                         ← Jupyter notebooks for Google Colab
    ├── 1_centralized_baseline.ipynb
    ├── 2_data_partitioning.ipynb
    ├── 3_federated_training.ipynb
    ├── 4_evaluation_visualization.ipynb
    ├── 5_practice.ipynb
    └── 6_privacy_extension.ipynb

Conceptual Roadmap

Step 1             Step 2              Step 3              Step 4
Centralized   →   Split Data    →   Federated FL    →   Compare
Baseline          (Non-IID)         (FedAvg)            & Visualize
"What's the       "Who has           "Train without      "Did FL work?"
best we           what data?"        sharing data"
can do?"
                                          ↓
                                       Step 5
                                       Practice
                                    "Change params,
                                     what happens?"
                                          ↓
                                       Step 6
                                    Privacy Extensions
                                   "Make it even more
                                    private with HE
                                    and Secure Agg."

Troubleshooting

Problem	Fix
`ModuleNotFoundError: torch`	Run `pip install torch numpy matplotlib`
`FileNotFoundError: client1.npz`	Run Step 2 before Step 3 or 6
`FileNotFoundError: centralized_results.json`	Run Step 1 before Step 4
`ModuleNotFoundError: tenseal`	Run `pip install tenseal` (Step 6 only) — script works without it
Step 6 TenSEAL install fails	Script auto-falls back to simulation — no action needed
Training is slow	Enable GPU: check that CUDA is installed, or use the Colab notebooks (free GPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Federated Learning Lab

What is Federated Learning?

Dataset: DermaMNIST

Quick Start

1. Install dependencies

2. Run the scripts in order

3. Prefer Jupyter / Google Colab?

Lab Steps

Step 1 — Centralized Baseline

Step 2 — Data Partitioning

Step 3 — Federated Training

Step 4 — Evaluation & Visualization

Step 5 — Practice

Step 6 — Privacy Extensions

Part A — Homomorphic Encryption (HE)

Part B — Secure Aggregation (Additive Masking)

Part C — Comparison Summary

Project Structure

Conceptual Roadmap

Troubleshooting

Further Reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CoLab		CoLab
.gitignore		.gitignore
1_centralized_baseline.py		1_centralized_baseline.py
2_data_partitioning.py		2_data_partitioning.py
3_federated_training.py		3_federated_training.py
4_evaluation_visualization.py		4_evaluation_visualization.py
5_practice.py		5_practice.py
6_privacy_extension.py		6_privacy_extension.py
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Federated Learning Lab

What is Federated Learning?

Dataset: DermaMNIST

Quick Start

1. Install dependencies

2. Run the scripts in order

3. Prefer Jupyter / Google Colab?

Lab Steps

Step 1 — Centralized Baseline

Step 2 — Data Partitioning

Step 3 — Federated Training

Step 4 — Evaluation & Visualization

Step 5 — Practice

Step 6 — Privacy Extensions

Part A — Homomorphic Encryption (HE)

Part B — Secure Aggregation (Additive Masking)

Part C — Comparison Summary

Project Structure

Conceptual Roadmap

Troubleshooting

Further Reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages