Data is at the core of machine learning. Nevertheless, in many real-world projects, a single party’s data is often not sufficient and needs to be augmented with data from other parties. However, there are also many concerns (regulatory, ethical, commercial etc.) stopping parties from exchanging data.
An example can be found in the healthcare domain. Individual hospitals alone typically have a limited amount of local data to build a robust model. There are existing studies (this one as an example) showing the benefit of using data from more hospitals to build models. Nevertheless, even though hospitals are convinced of the value of sharing data, there are a lot of regulatory concerns stopping hospitals from sharing data, since healthcare data is usually viewed as sensitive personal data in either general or sectoral data protection regulations, e.g. GDPR or HIPAA.
Federated Learning is an emerging privacy-preserving machine learning technology. It enables multiple parties holding local data to collaboratively train machine learning models without exchanging their data with one another, hence preserving the confidentiality of different parties’ local data.
Synergos is a platform built by AI Singapore to make Federated Learning more accessible and sustainable. Synergos is a Greek word, from which the English word “Synergy” was derived. It means “to work together” or “to cooperate”, which is the very gist of the vision that Federated Learning promises. In this chapter, we discuss about Synergos’ design and its various key functional blocks.
Key blocks of Synergos
Synergos is essentially a distributed system, in which different parties work together to train a machine learning model, without exposing the data to one another. The diagram below shows a single-party view of Synergos’ key functional blocks.
Broadly, these blocks are grouped into three layers of functionalities.
- Federated training, whose aim is to make Federated Learning accessible and user-friendly.
- Model management, whose goal is to make Federated MLOps simple.
- Platform management, whose goal is to make Federated Learning sustainable.
Each block is supported by multiple components (refer to here for a mapping between blocks and components).
The core of Synergos is its Federation block. Here is where the coordination among different parties to train a global model (without exposing data) happens. Federation defines the application level protocol over WebSocket to form a Federated Grid. A federated grid is a star-architecture network formed by different parties, who exchange messages among themselves to coordinate the federated training and inference.
Compute & Storage acts as an interface to different compute and storage backends. Synergos currently assumes that the data is managed by a file system or a shared volume and the compute load is handled by a single CPU node. Support for other storage services and compute frameworks (e.g. Spark) is in the roadmap.
In Synergos, as is typical in machine learning, multiple models are usually built, and one of them is eventually chosen as the model to be deployed into production. Training of different models are usually configured with different training datasets, model types, and/or hyperparameters. Model Lifecycle Management is responsible for tracking the training of different models, and record and compare results. It also serves as a model registry to manage the lifecycle of federated models, including model versioning and stage transitions.
As mentioned earlier, a federated grid is where the federated training really happens. In Synergos, this is not a persistent setup. It is typically destroyed when federated learning finishes. To run multiple experiments, Orchestration starts multiple Federated Grids and configures them with different sets of data and hyper-parameters. The trainings are then tracked by Model Lifecycle Management. When all trainings complete and a model is elected to transit to the production stage, Model Serving makes sure the model is up and running and is able to receive inference requests from the users.
Usually, different parties incur non-negligible costs in acquiring and cleaning their data. They rarely altruistically share their data with others, at the risk of losing their competitive edge. These parties would be more motivated to share their data, when given enough incentives, such as a guaranteed benefit from the collaboration and a higher fair reward from contributing more valuable data. Otherwise, without any party motivated to contribute data, it can be detrimental to the sustainability of Federated Learning.
Synergos makes Federated Learning sustainable by building the Contribution & Reward component to evaluate contributions and reward different parties fairly based on their contributions. The reward does not necessarily have to be in monetary sense. It could also be in the form of model reward, which rewards the participating parties with models of different quality based on their contribution.
In Synergos, although different parties do not expose data to one another, they still need to "declare" the data and compute resource that they are contributing. These are the "meta-data" that needs to be managed. We envision that when the Federated Learning ecosystem grows further, an open marketplace will be formed with multiple parties "advertising" the data they have. They will need to “register” their data to the marketplace's data catalog system so that others can search and discover what data are available. Meta-data Management acts as an interface to the data catalog system.
Dashboard is the place for different parties to interact with Synergos platform and build federated models jointly. It provides a GUI, that gives a one-stop view of all the information generated by the different functional blocks.