Welcome to the MLEDGE Project Notebook Repository. This repository contains Jupyter notebooks developed for the three distinct use cases of the MLEDGE project. MLEDGE explores the application of Federated Learning, Edge Computing, and Privacy-Preserving AI across diverse real-world use cases in both traditional and digital economies. This repository contains Jupyter notebooks demonstrating experiments and model development for these use cases.
The repository is organized into three main directories, each corresponding to a specific use case:
mledge/
β
βββ Use Case 1/
β βββ 0_centralized/
β β βββ 0_model1.ipynb
β β βββ 1_model2.ipynb
β β βββ 2_model_combined.ipynb
β β
β βββ 1_FL/
β βββ coordinator_model1.ipynb
β βββ coordinator_model2.ipynb
β βββ worker_model1.ipynb
β βββ worker_model2.ipynb
β
βββ Use Case 2/
β βββ 0_mobility/
β β βββ 0_filter_geohashes.ipynb
β β βββ 1_filter_home_and_work.ipynb
β β βββ 2_mobility_map.ipynb
β β
β βββ 1_epidemic/
β β βββ 3_hospital_presence.ipynb
β β βββ 4_filter_hospitalized.ipynb
β β βββ 5_hospitalization_map.ipynb
β β βββ 6_self_confined.ipynb
β β
β βββ 2_FL/
β βββ 7_split_by_geohash.ipynb
β βββ 8_split_by_telco_id.ipynb
β βββ 9_combine_data.ipynb
β
βββ Use Case 3/
β βββ FedQV
β β βββ coordinator.ipynb
β β βββ worker.ipynb
β βββ FedRep
β βββ coordinator.ipynb
β βββ worker.ipynb
β
βββ README.md
This use case focuses on optimizing the energy consumption of industrial steam boilers through the use of Machine Learning and Federated Learning (FL), ensuring the privacy of sensitive data at all times.
Steam boilers, a key element in many traditional industrial processes, generate sensitive data across multiple facilities, including those belonging to competing companies. Through federated learning technologies, this use case enables the training of advanced predictive models without the need to share confidential data between organizations.
Reduce energy consumption and optimize the operational efficiency of steam boilers by:
- Developing Machine Learning models for anomaly detection, production forecasting, and operational parameter optimization.
- Applying Federated Learning to create robust models from distributed data sources while preserving data privacy.
-
Control System (Level 1):
- Real-time monitoring of key variables such as gas consumption, steam production, temperature, etc.
- Visualization of efficiency indicators (e.g., Coefficient of Performance - COP) via an online platform.
-
Machine Learning (Level 2):
- Predictive models for early detection of anomalies and system optimization.
- Steam production forecasting and optimization suggestions.
- Cost-benefit analysis and sensor system optimization.
-
Federated Learning (Level 3):
- Shared models across different facilities, with no exchange of sensitive data.
- Recommendations based on performance comparison with similar equipment.
- Identification of improvement opportunities and energy efficiency investments.
The use case has been validated across four industrial facilities from diverse sectors:
- US_1 - Cosmetics Industry: Performance optimization and anomaly detection.
- US_2 - Food Industry: Energy process optimization.
- US_3 - Industrial Laundry: Steam production forecasting and optimization.
- US_4 - Pharmaceutical Industry: Inefficiency detection and maintenance decision support.
- Reduction of energy consumption and associated costs.
- Early detection of failures and inefficiencies.
- Improved operational efficiency and sustainability.
- Full preservation of participating companies' data privacy.
- Scalability to other industrial equipment (e.g., compressors, chillers, etc.).
This use case represents a decisive step in the digital transformation of the traditional economy, combining AI, edge computing, and privacy-preserving technologies in real-world industrial environments.
This use case focuses on the use of mobility data from mobile network operators to help identify epidemic risk areas, contributing to the early detection and monitoring of public health emergencies while guaranteeing data privacy.
The project demonstrates how data from mobile operators, such as Orange, can be used to estimate real-time population density, analyze mobility patterns, and detect hospitalizations or self-confinement, all while preserving the anonymity of individuals.
- Identify epidemic risk areas based on mobile network mobility data.
- Estimate potential hospital occupancy to support healthcare resource management.
- Detect patterns of self-confinement in the population.
- Ensure complete privacy of individuals and data from participating entities through anonymization and federated learning.
A proof of concept has been successfully carried out using real-world mobility data from the cities of Madrid and Barcelona, demonstrating the solution's potential.
The system leverages anonymized mobile network data to generate:
- Mobility Maps: Visualize the daily movements of the population within urban areas.
- Heatmaps of Hospitalizations: Estimate hospital occupancy by analyzing prolonged user presence in hospital areas.
- Self-Confinement Detection: Identify individuals voluntarily remaining at home for extended periods.
- Federated Analytics: Simulate a federated environment where different data providers collaborate securely without sharing sensitive data.
The system processes anonymized data such as:
- Mobile device connections to cell towers.
- Estimated home and workplace locations.
- Hospital locations and capacities.
Techniques applied include:
- Geohash encoding for efficient spatial data processing.
- Estimation of hospitalizations based on prolonged presence in hospital areas.
- Identification of self-confinement by analyzing prolonged presence at home for users not engaged in teleworking.
- Creation of heatmaps showing population density, hospital occupancy, and confinement trends.
The project emphasizes:
- Full anonymization of all personal data.
- Use of federated data partitioning simulating multiple independent data providers.
- Secure aggregation techniques that enable global insights without centralized sensitive data.
- Application of differential privacy, k-anonymity, and secure multi-party computation methods.
This ensures data privacy while providing critical information to support public health management.
- Real-time insights into population mobility patterns.
- Improved ability to estimate hospital occupancy during health crises.
- Early detection of changes in confinement behavior.
- Scalable solution applicable to other regions and health emergencies.
- Promotes secure collaboration between data providers (e.g., telecom operators) without compromising privacy.
This use case showcases the potential of combining telecommunications data, artificial intelligence, and federated learning to support public health efforts in the digital economy, while maintaining the strictest privacy standards.
This use case focuses on improving the robustness and security of Federated Learning (FL) in cloud and edge environments, with particular emphasis on the implementation and validation of two advanced aggregation protocols: FedQV (Federated Quadratic Voting) and FedRM-RR (Federated Repeated Median Regression with Reputation Reweighting).
Additionally, the project includes the development of a FLaaS (Federated Learning as a Service) platform, along with tools for cloud cost optimization, enabling scalable, secure, and privacy-preserving FL across heterogeneous cloud providers.
- Develop and deploy a Federated Learning platform (FLaaS) for hybrid cloud and edge environments.
- Improve the security and robustness of FL through novel aggregation mechanisms resistant to poisoning attacks.
- Provide tools for cloud cost monitoring and optimization to ensure efficient deployment of federated projects.
- Support the real-world use cases of traditional and digital economy through secure and efficient infrastructure.
- FLaaS Platform: Modular architecture for deploying federated learning environments across clouds and edge devices.
- Federated Quadratic Voting (FedQV): A secure aggregation mechanism based on quadratic voting principles that limits the influence of malicious participants and improves model robustness.
- FedRM-RR: A reputation-based, statistically robust aggregation protocol designed to detect and mitigate unreliable or malicious updates in FL environments.
- Cloud Cost Optimization Tools: Interfaces for monitoring and comparing cloud resource costs across AWS, GCP, and Azure, facilitating informed, cost-efficient deployment decisions.
FedQV - Federated Quadratic Voting
FedQV introduces a voting-based aggregation strategy where each participant receives a limited voting budget based on reputation and contribution. By penalizing abnormal updates and controlling vote influence, FedQV:
- Significantly reduces the success rate of poisoning attacks.
- Enhances the robustness of FL without compromising privacy.
- Provides seamless integration into existing FL pipelines with minimal code changes.
- Supports environments with high data heterogeneity and resource-constrained edge devices.
FedRM-RR - Federated Repeated Median Regression with Reputation Reweighting
FedRM-RR combines robust statistical techniques with dynamic reputation scoring to:
- Detect and mitigate unreliable or malicious model updates.
- Assign influence based on historical behavior and trustworthiness.
- Improve convergence speed and model performance, even under adversarial conditions.
- Enable FL in sensitive environments (e.g., health, privacy-critical domains) with enhanced security.
Both protocols have been implemented, validated, and integrated into the FLaaS platform, providing production-ready tools for secure, scalable federated learning.
While this repository focuses on the implementation of the FedQV and FedRM-RR protocols, the broader project also includes:
- A complete FLaaS platform, including execution, communication, coordination, and governance layers for FL deployment.
- Cloud cost optimization interfaces, offering cost simulation, resource comparison, and multi-cloud management for federated projects.
These components have been developed as part of the MLEDGE project but are not included in this repository due to their scope, proprietary nature, or deployment-specific characteristics.
- Strengthened security and robustness of FL in real-world scenarios.
- Scalable, privacy-preserving deployment of FL across cloud and edge infrastructures.
- Reduction of FL infrastructure costs through informed cloud resource management.
- Practical tools for organizations to adopt secure, efficient FL without sacrificing data privacy.
This use case represents a crucial step towards building secure, reliable, and economically viable federated learning ecosystems, ready for industrial and privacy-sensitive applications.
Each notebook demonstrates experiments, model development, and insights for the corresponding use case. They are intended to:
- β Explore data and preprocessing steps
- β Develop and evaluate machine learning models
- β Test deployment strategies for edge environments
- β Document key findings and lessons learned
To run these notebooks, you will need to have at least three running nodes in Acuratio's platform.
Contributions, feedback, and suggestions are welcome! Feel free to open issues or pull requests.