Introduction to MLOps and Best Practices for resource-efficient Machine Learning Pipelines

What MLOps is and how machine learning pipelines can be set up in a resource-efficient way - a review of 3 papers.

Oct 28, 2024

Since the progress that has been made in ML models in recent years, tools with machine learning in the background are being used more and more frequently. Machine learning and deep learning technologies, with all their advantages and disadvantages, have arrived in the wider society. To use machine learning models, processes are of course also required to efficiently develop, provide, monitor and, if necessary, adapt the models. This is where the term MLOps comes into play, which builds on the principles of DevOps but is specifically adapted to machine learning. We are now living in a time in which sustainability is also important. ML models require immense computing resources, especially for training and updating. This leads to considerable energy consumption and thus contributes to CO2 emissions.

Table of Content
1. What is MLOps and why does it need sustainability?
2. Approaches for sustainable MLOps — Article 1: Sustainability through self-adaptation in MLOps
2. Approaches for sustainable MLOps — Article 2: Sustainability across the entire life cycle
2. Approaches for sustainable MLOps — Article 3: Systematic MLOps strategies
3. Sustainability dimensions in MLOps
4. Practical implementation: First steps for sustainable MLOps pipelines
5. Final Thoughts

I have reviewed three recent papers on this topic and share the most important points in this article. What are the best practices and techniques to make MLOps as sustainable as possible?

1. What is MLOps and why does it need sustainability?

MLOps is the combination of ‘Machine Learning’ and ‘Operations’ and is an extension of the concepts of DevOps. It encompasses all the processes and tools required to develop, deploy, monitor and, if necessary, adapt ML models efficiently and reliably — in other words, it optimises the entire life cycle of models.

The challenges in terms of sustainability arise primarily from the intensive computational requirements that go hand in hand with high energy and resource consumption. In particular, the training, retraining and provision of machine learning models require considerable amounts of computing power and storage resources. This in turn can lead to high operating costs and a large ecological footprint.

Deep learning models in particular often require enormous amounts of data and intensive computing resources to be optimally trained. A study by Strubell et al. (2019) shows that the training process of large language models such as BERT or GPT-2 alone can cause several tonnes of CO2 emissions. For example, a one-off training of BERT on GPUs is roughly equivalent to the emissions of a flight from Europe to North America and back. Training GPT-2 on 32 TPU v3 chips is estimated to take about a week — which is roughly equivalent to driving tens of thousands of kilometers in an average car.

Furthermore, ML models require continuous monitoring and maintenance, as the models often lose accuracy and reliability when the underlying data changes. This requires periodic retraining, which in turn requires resources and costs. You can read more about this in the study by Sculley et al (2015). The authors describe the concept of the ‘technical debt mountain’ in machine learning systems and examine the long-term maintenance problems and hidden costs that arise from the development and use of machine learning models. Companies also need a reliable infrastructure to use ML models productively. This means that appropriate GPUs are required. In addition, the enormous amount of data required for ML models must be stored and managed somewhere. This is particularly relevant for real-time ML applications such as fraud detection in the financial sector, as these models have to continuously process and analyze the data. One example of this is applications for fraud detection in the financial sector. In such applications, the models analyze transaction data in real time to detect suspicious patterns and flag or directly block potentially fraudulent activity. In the Stanford University AI Index of 2021, we can read that these requirements lead to sharply rising storage and processing costs, as the models are constantly dependent on up-to-date data to work effectively and quickly.

Sustainable practices in MLOps now aim to utilize the necessary resources as efficiently as possible. Three important measures that I would like to emphasize right at the beginning are:

Optimizing the training processes: This point is about using techniques such as transfer learning and incremental learning to improve existing models instead of training the model from scratch. This means that the entire training process does not have to be repeated.
Adapt and automatically select models: By using self-adapting mechanisms such as MAPE-K loops (Monitor, Analyse, Plan, Execute — Knowledge), changes in the data can be automatically detected and models are dynamically adapted. This minimizes unnecessary resource consumption.
Automated monitoring and efficient infrastructure: This point describes how resources are allocated according to demand. Resource-saving models should be used for forecasting. As soon as demand increases, the system can switch to more computationally intensive models.

Article 1: Sustainability through self-adaptation in MLOps

→ → → Towards Architecting Sustainable MLOps: A Self-Adaptation Approach

A central strategy in this article is the MAPE-K loop for self-adaptation in the MLOps pipeline. The authors present these methods to enable the pipeline to react efficiently to uncertainties and changing conditions. They want to show how MLOps can use this integrated loop to make operations more sustainable. The loop is integrated in such a way that it can recognize changes in real-time and make dynamic adjustments. The loop consists of four steps:

Monitor: Data on the status and performance of the MLOps pipeline is collected here. The monitoring focuses, for example, on energy consumption and model accuracy.
Analyse: The collected data is evaluated to detect deviations or anomalies. The analysis recognizes changes in the models and the data. Such a change could be a model drift, for example, if the data that the model processes in production differ from the training data. Specifically, this could be a model that predicts customer behavior and is based on historical data. Due to seasonal or economic changes, the model is suddenly no longer accurate, which reduces the accuracy of the predictions. The MLOps pipeline would have to recognize and adapt to such a change.
Plan: Based on the results of the analysis, the system develops a strategy for adapting the pipeline. Rules and threshold values are defined for this in the MLOps pipeline. The system then selects suitable tactics from a knowledge database, for example, to achieve the goal of reducing energy consumption.
Execute: This is where the planned adjustments are implemented. The loop accesses a knowledge database that stores historical data, adaptation strategies and target metrics. This knowledge database is continuously updated to improve future customizations. For example, the system could switch to a more energy-efficient model if the energy consumption becomes too high or use a more precise model if the prediction quality is particularly important with stable data.

The authors of the article describe how this loop enables the MLOps pipeline to react automatically and flexibly to changes. In other words, it can react flexibly to model shifts or changing energy requirements. The authors emphasize that this not only ensures the performance of the pipeline but also optimizes its resource consumption.

The authors use a practical example to show how this sustainable self-adaptation works in a smart city project for air quality monitoring. In this project, data on air quality in India is collected in real-time and analyzed by ML models. The challenges for sustainability arise from the fluctuating data quality and the need for regular model updates. In the case study, the self-adaptive approach shows that it can switch between a more energy-efficient, faster model and a more powerful model that consumes more resources to predict air quality. The MAPE-K loop continuously monitors environmental conditions and model accuracy. As soon as a model no longer fulfills the necessary prediction requirements or the energy conditions change, the pipeline automatically switches between the models. The authors describe that energy consumption could be reduced by 32% compared to traditional approaches where the models were regularly retrained.

Article 2: Sustainability across the entire life cycle

→ → → MLOps Spanning Whole Machine Learning Life Cycle: A Survey

This article examines the entire life cycle of machine learning in terms of sustainability. The authors provide a comprehensive overview of the key steps and technologies required in the MLOps process. The focus is on optimizing each phase. The article describes how technological developments and large amounts of data have enabled machine learning to reach a level of maturity that allows it to be implemented in real applications. However, the transfer of prototypes into production brings challenges that need to be overcome with ML-Ops. The authors define an MLOps model consisting of the 8 steps:

Steps in MLOps — Own Visualization (Adapted from Visualization in Article)

The article describes that each of these phases has a significant impact on resource consumption and efficiency and how important it is to use resource-saving methods and techniques to reduce the high energy and resource consumption in ML pipelines.

Data collection and preparation
This phase can require a lot of energy and storage space. Resource consumption can be reduced by minimizing redundant data or using smaller, more representative data sets. The authors also emphasized the role of Exploratory Data Analysis (EDA) in identifying patterns and anomalies in the data at an early stage.

Model training
Training large models is often computationally intensive and energy-intensive. The authors describe methods such as transfer learning, the use of smaller models or the use of cloud computing services with sustainable energy sources as optimization options.

Transfer learning is a method in machine learning in which an already trained model is applied to a similar, new task. This saves computing resources and time, as there is no need to retrain from scratch.

Deployment and monitoring
Once the model has been evaluated, it is transferred to the production environment. To maintain the accuracy and reliability of the model in the long term, continuous monitoring and maintenance is necessary. The article describes approaches such as automated monitoring and the detection of problems through early drift and anomaly detection. These methods recognize deviations or performance losses in the model, such as those caused by changing data patterns or model aging, before they have a significant impact on quality. In this way, resources can be utilized more efficiently and the reliability of the entire ML pipeline can be ensured.

The authors show various sustainable methods that can be used in the respective phases. For example, models can be dynamically scaled so that they only utilize computing resources when required. The models can also be run on energy-efficient hardware. For example, these are processors that are specialized for systems for ML and AI workloads, such as TPUs, GPUs or ASICs. The authors also describe the use of techniques such as automated data cleansing, predictive maintenance and on-demand model training to increase efficiency and reduce energy consumption.

Article 3: Systematic MLOps strategies

→ → → Sustainable Engineering of Machine Learning-Enabled Systems: A Systematic Mapping Study

This article describes the concept of Sustainable Engineering of Machine Learning-Enabled Systems and how this concept can be systematically promoted. The authors describe ways to integrate sustainability for each phase of the machine learning lifecycle.

The most important techniques include energy-saving algorithms and adaptive systems that utilize resources efficiently and reduce CO₂ emissions:

Adaptive system management
This is a frequently used framework that dynamically replaces resource-intensive models with energy-saving models as and when required. This reduces the duration and intensity of training processes, thereby significantly reducing energy consumption.

Carbon-aware scheduling
This is a technique specifically aimed at reducing the energy consumption and CO2 impact of machine learning applications. It optimizes the geographical distribution and temporal use of computing resources. The technique ensures that ML processes are executed in regions and during time windows in which the CO2 intensity of the power supply is particularly low. A typical example is the execution of computing jobs in data centers located in regions with a high proportion of renewable energy. Furthermore, energy-intensive tasks are preferably carried out during periods of low electricity load, such as at night or at weekends, when fewer fossil fuels are used to generate electricity. This dynamic allocation not only ensures a direct reduction in CO₂ emissions but also utilizes existing infrastructure more efficiently, thereby reducing the overall cost of energy consumption.

Energy-efficient hardware selection
This method describes the selection of specific hardware. For example, TPUs or specialized GPUs are used, as these types of hardware are optimized for low energy consumption and still offer high computing power.

Model compression
This method is highlighted by the authors as an important technique in that the size and computational requirements of the models can be reduced by means of pruning or quantization. The article describes that this compression enables a more resource-efficient implementation, which is particularly important for models in real-time operation.

In addition, the authors emphasize that sustainable MLOps practices not only lead to cost savings but also ensure the long-term maintainability and adaptability of the systems. They describe that the systematic approach helps organizations to make informed and environmentally conscious decisions regarding their ML infrastructures.

3. Sustainability dimensions in MLOps

The development of sustainable machine learning pipelines requires a balance between the different dimensions of technology, ecology and economy;

Technology
A central factor for technical sustainability is the maintainability and adaptability of MLOps pipelines. Through model switching and automatic resource management, as described for example with the MAPE-K loop in Article 1, pipelines can switch dynamically between models with high energy consumption and resource-saving models. This enables more efficient utilization of computing power and memory resources. This in turn can extend the service life of the systems and at the same time ensure their performance.

Ecology
These technical measures contribute directly to ecological sustainability: By running models only when required or scaling them dynamically, the energy consumption of the system is reduced. Techniques such as the use of energy-efficient hardware and model-internal optimizations also help to reduce energy consumption. Energy-efficient hardware (e.g. TPUs, specialized GPUs or ASICs) makes it possible to operate models with less power while maintaining high computing performance. At model level, optimizations such as model compression can be performed, which means that models require less memory and computing power. I found a Medium article that explains this topic in more detail.

Economy
The provision of ML models also results in costs. By utilizing resources efficiently — for example by scaling computing power only when required or by using edge computing, where models run locally instead of in the cloud — companies can not only reduce their operating costs but also reduce energy consumption in the long term.

4. Practical implementation: First steps for sustainable MLOps pipelines

What concrete steps can now be taken to make MLOps pipelines as sustainable as possible in practice?

Optimization of data storage and processing
The authors of articles 1 and 2 suggest optimizing data storage in a targeted manner by avoiding redundant data and only storing required information. For example, the system load and memory consumption can be reduced by compressing files or storing only specific data. The articles also describe self-adaptive mechanisms as suitable measures to ensure continuous, sustainable processes. This means that data flows are dynamically adapted.

General tips that fall into this area are to ensure data quality in advance to avoid unnecessary computing processes. Select the most important variables in advance. Also, filter the data in advance to remove irrelevant data so that such data never enters the pipeline in the first place.

Continuous monitoring and adaptation
Article 1 emphasizes that the use of the MAPE-K loop provides a structured framework for monitoring and adaptation. This technique helps to continuously monitor models and data streams and adapt them if necessary. This makes it possible to adapt the models to current requirements and conditions — for example, switching to a more energy-efficient model if energy consumption is too high.

In general, best practice is to integrate regular tests and analyses into the MLOps pipeline. This allows you to recognize deviations in data quality and model performance at an early stage to make adjustments and promote the stability of the system.

Long-term model and data management
In article 2 and article 3, we saw how important the management of model and data versions is. This is particularly important to ensure traceability. Systematic versioning means that previous models are retained. This enables transparency on the one hand and flexibility for future optimizations on the other.

5. Final Thoughts

We can assume that machine learning technologies will continue to spread. For this very reason, it will most likely become important to integrate sustainable techniques into MLOps. In the articles, we have seen that techniques such as self-adaptive mechanisms and MAPE-K loops can reduce the energy consumption of ML models.

References:

Data Science Espresso by Sarah Lea

Discussion about this post