Machine learning operations, or MLOps, is all about streamlining the process of developing, deploying, and maintaining machine learning models. This involves automating tasks like data preparation, model training, and model deployment, making it easier to deliver high-quality models quickly.
Continuous delivery is a key concept in MLOps, allowing developers to automate the process of delivering new or updated models to production. This involves a pipeline that includes automated testing, validation, and deployment.
By automating these tasks, organizations can reduce the risk of errors and improve the overall efficiency of their machine learning operations.
For more insights, see: Difference between Model and Algorithm in Machine Learning
What Is
Machine learning operations, or MLOps, is the practice of creating new machine learning and deep learning models and running them through a repeatable, automated workflow that deploys them to production.
An MLOps pipeline provides a variety of services to data science teams, including model version control, continuous integration and continuous delivery (CI/CD), model service catalogs for models in production, infrastructure management, monitoring of live model performance, security, and governance.
MLOps is the marriage between machine learning and operations, aiming to make machine learning models reliable and scalable to a large number of users.
The field of MLOps attempts to develop machine learning pipelines that meet four key goals: the pipeline should follow a templated approach, the models should be reproducible and iterable, the pipeline should be scalable, and the pipeline should be automated from end-to-end.
MLOps pipelines should follow a templated approach, making it easier to reproduce and iterate on models.
Characteristics and Benefits
In a typical MLOps level 0 process, every step is manual, including data analysis, data preparation, model training, and validation. This means that data scientists have to manually execute each step and transition from one step to another.
The process is usually driven by experimental code written and executed in notebooks by data scientists until a workable model is produced. This can be a time-consuming and labor-intensive process.
A manual process like this can lead to disconnection between ML and operations, where data scientists create the model and engineers serve it as a prediction service. This handoff can result in training-serving skew, where the model is not optimized for production.
Infrequent release iterations are also a characteristic of MLOps level 0. This means that new model versions are deployed only a couple of times per year, which can lead to outdated models being used in production.
The lack of continuous integration (CI) and continuous deployment (CD) in MLOps level 0 means that testing and deployment are not automated, which can lead to errors and delays. Deployment refers to the prediction service, rather than the entire ML system.
Here are the characteristics of MLOps level 0:
- Manual, script-driven, and interactive process
- Disconnection between ML and operations
- Infrequent release iterations
- No CI
- No CD
- Deployment refers to the prediction service
- Lack of active performance monitoring
MLOps, on the other hand, is a cyclical, highly automated approach that reduces the time and complexity of moving models into production. It also enhances communications and collaboration across teams and streamlines the interface between R&D processes and infrastructure.
Implementing MLOps
Implementing MLOps requires a structured approach to machine learning pipeline development. This can be achieved by using frameworks such as MLFlow or KubeFlow, which help manage the details of pipeline development.
A templated approach to pipeline design is essential, allowing each stage to interact with the next without friction. The AWS ecosystem offers a wide range of ML services to support this approach.
To implement MLOps, you can use Azure Machine Learning pipelines to stitch together all the steps in your model training process. The pipeline stages can include data preparation, feature extraction, hyperparameter tuning, and model evaluation.
Here are the five stages of a traditional ML pipeline:
- Scoping: Defining the problem and design requirements.
- Data Engineering: Collecting, cleaning, and transforming data.
- Modeling: Training the model and performing error analysis.
- Deployment: Packaging the final model for production.
- Monitoring: Tracking model and serving performance.
Why Do We Need?
We need MLOps because it's not just about the model code, but also about the infrastructure that serves model predictions to client applications or users.
The MLOps approach is necessary because large-scale ML systems require more than just adding more compute power.
In fact, the ML code is often just a small part of the greater ecosystem, and without proper frameworks and management processes, these systems can quickly get unwieldy.
The problem of large-scale ML systems can't be simply handled by throwing more hardware at it.
We need MLOps to handle the complexity of serving model predictions, storing common predictions in a database or cache, and addressing data security concerns.
Implementing in Your Organization
To implement MLOps in your organization, you can follow Google Cloud's framework, which consists of moving from "MLOps Level 0" to "MLOps Level 2" through a fully automated MLOps pipeline.
The MLOps approach is necessary because it's not just about storing models in larger computing platforms, but also about considering the entire ecosystem, including serving infrastructure, data security, and more.
Azure Machine Learning publishes key events to Azure Event Grid, allowing you to set up event-driven processes and automate on events in the machine learning lifecycle.
You can use Git and Azure Pipelines to create a continuous integration process that trains a machine learning model, making it easier to work with Azure Pipelines through the Machine Learning extension.
A traditional ML pipeline consists of five general stages: Scoping, Data Engineering, Modeling, Deployment, and Monitoring.
To orchestrate machine learning pipelines, you can use a templated approach, such as MLFlow or KubeFlow, to reduce unnecessary headaches and ensure that each stage can interact with the next without friction.
Here's a brief overview of the stages in a machine learning pipeline:
- Scoping: Scoping the problem and defining design requirements
- Data Engineering: Establishing data collection methods, data cleaning, EDA, and feature engineering/transformation steps
- Modeling: Training the model(s), performing error analysis, and comparing against baselines
- Deployment: Packaging up the final model to serve model predictions in production
- Monitoring: Monitoring and tracking model and serving performance, and detecting errors or data drift
By following these stages and using a templated approach, you can create a machine learning pipeline that meets the goals of MLOps and helps you iterate quickly and effectively.
Automation and Pipelines
Automating model deployment is essential for MLOps as it streamlines the process of integrating trained machine learning models into production environments, ensuring consistency and reducing the risk of errors.
Automating model deployment also shortens the time it takes to deploy a model from development to production, enabling businesses to benefit from the insights generated by the model more quickly.
Automating model deployment allows for more frequent and seamless updates, ensuring that production models are always using the latest trained versions, particularly important when dealing with dynamic data or rapidly evolving business needs.
To automate model deployment, you can use a loosely coupled architecture, which enables teams to easily test and deploy individual components or services without relying on other teams for support and services.
A loosely coupled architecture can be achieved by structuring the machine learning project, using dedicated templates such as the Cookiecutter Data Science Project Template, The Data Science Lifecycle Process Template, or PyScaffold.
In addition to automating model deployment, automating hyperparameter optimization (HPO) is also crucial for MLOps, as it helps find the best set of hyperparameters for a given machine learning model.
There are several approaches to HPO, including grid search, random search, Bayesian optimization, genetic algorithms, and gradient-based optimization.
Automating HPO can have significant benefits, including improved model performance, increased efficiency, consistency and reproducibility, and continuous improvement.
A unique perspective: A Practical Guide to Quantum Machine Learning and Quantum Optimization
To automate HPO, you can use a CI/CD pipeline, which automates the end-to-end machine learning lifecycle by continuously testing, updating, and rolling out new machine learning models.
A CI/CD pipeline consists of several stages, including development and experimentation, pipeline continuous integration, pipeline continuous delivery, automated triggering, model continuous delivery, and monitoring.
Here are the three levels of automation in MLOps:
- MLOps level 0: Manual process
- MLOps level 1: ML pipeline automation
- MLOps level 2: CI/CD pipeline automation
Data Management
Data Management is a crucial aspect of machine learning, and it's essential to get it right to ensure the success of our models. Data labeling is strictly controlled by establishing clear guidelines, training annotators, and using multiple annotators to reduce individual biases.
To ensure high-quality labeled data, it's recommended to develop comprehensive and unambiguous labeling instructions, train and assess annotators, and use consensus or other techniques to aggregate their inputs. This can be achieved by following these best practices:
- Develop clear labeling guidelines
- Train and assess annotators
- Use multiple annotators
- Monitor and audit the labeling process
Data sanity checks are also essential for external data sources, including data validation, detecting anomalies, and monitoring data drift. This helps prevent issues related to data quality, inconsistencies, and errors, and improves model performance.
Data preparation is a time-consuming process that involves data cleaning, transformation, and merging from multiple sources. Writing reusable scripts for these tasks can improve efficiency and maintain consistency across projects by modularizing code, standardizing data operations, automating data preparation, and using version control for scripts.
Related reading: Data Labeling Machine Learning
Reusable Data Cleaning and Merging Scripts
Reusable data cleaning and merging scripts are a game-changer for data management. They improve efficiency, maintain consistency, and reduce errors.
Breaking down data preparation tasks into smaller, independent functions makes them easier to reuse and combine, enabling faster development and simplifying debugging. This is known as modularizing code.
Standardizing data operations with functions and libraries for common tasks like data cleansing, imputation, and feature engineering promotes reusability and reduces duplication. This ensures consistent data handling across projects.
Automating data preparation pipelines minimizes manual intervention and reduces the potential for errors, making it easier to maintain and update data processes. Version control systems help manage changes in data preparation scripts, ensuring the latest and most accurate version is always used.
Here are some key practices to consider when writing reusable scripts for data cleaning and merging:
Metadata Management
Metadata management is a crucial aspect of data management. It helps with data and artifacts lineage, reproducibility, and comparisons.
A unique perspective: Data Labeling in Machine Learning with Python
Each time you execute a machine learning pipeline, the metadata store records the pipeline and component versions that were executed. The start and end date, time, and how long the pipeline took to complete each of the steps are also recorded.
The executor of the pipeline, parameter arguments, and pointers to the artifacts produced by each step are tracked. This includes the location of prepared data, validation anomalies, computed statistics, and extracted vocabulary from the categorical features.
Tracking these intermediate outputs helps you resume the pipeline from the most recent step if the pipeline stopped due to a failed step, without having to re-execute the steps that have already completed.
A pointer to the previous trained model is also recorded, allowing you to roll back to a previous model version or produce evaluation metrics for a previous model version when the pipeline is given new test data during the model validation step.
The model evaluation metrics produced during the model evaluation step for both the training and the testing sets are recorded. These metrics help you compare the performance of a newly trained model to the recorded performance of the previous model during the model validation step.
Here are the key metadata recorded for each pipeline execution:
- Pipeline and component versions
- Start and end date, time, and execution time
- Executor of the pipeline
- Parameter arguments
- Pointers to artifacts produced by each step
- Pointer to the previous trained model
- Model evaluation metrics for training and testing sets
Testing and Monitoring
Testing and monitoring are crucial components of MLOps continuous delivery and automation pipelines in machine learning. Continuous monitoring of deployed models ensures that machine learning models maintain their performance and reliability in production environments.
Continuous monitoring involves collecting and analyzing key performance metrics, such as precision, recall, or F1 score, at regular intervals to evaluate the model's effectiveness. It also includes monitoring input data for anomalies, missing values, or distribution shifts that could impact the model's performance.
Tests for reliable model development include routines that verify algorithms make decisions aligned to business objectives, model staleness tests, and assessing the cost of more sophisticated ML models. These tests help detect ML-specific errors and ensure the model's quality and fairness.
The "ML Test Score" system measures the overall readiness of the ML system for production. The final ML Test Score is computed by taking the minimum of the scores aggregated for each of the sections: Data Tests, Model Tests, ML Infrastructure Tests, and Monitoring. The following table provides the interpretation ranges:
Monitoring activities in production include tracking dependency changes, data invariants, and numerical stability of the ML model. It also involves monitoring computational performance, feature generation, and predictive quality of the model.
Testing
Testing is a crucial aspect of machine learning (ML) development. It's essential to ensure that your ML system is reliable and accurate.
The complete development pipeline includes three essential components: data pipeline, ML model pipeline, and application pipeline. In accordance with this separation, we distinguish three scopes for testing in ML systems: tests for features and data, tests for model development, and tests for ML infrastructure.
Data validation is a critical aspect of testing. It involves automatic checks for data and features schema/domain. This ensures that the data is accurate and reliable.
Tests for features and data include feature importance tests to understand whether new features add a predictive power. Features and data pipelines should also be policy-compliant, such as GDPR. These requirements should be programmatically checked in both development and production environments.
Tests for reliable model development are also essential. This includes testing ML training to verify that algorithms make decisions aligned to business objectives. Model staleness tests are also necessary to ensure that the trained model includes up-to-date data and satisfies business impact requirements.
Recommended read: Machine Learning Applications in Business
ML infrastructure tests are critical to ensure that the ML system is reliable and accurate. This includes training the ML models to be reproducible, testing ML API usage, and validating the algorithmic correctness.
The "ML Test Score" system provides a framework for measuring the overall readiness of the ML system for production. The score is computed by awarding points for executing tests manually or automatically, and summing the scores for each section: Data Tests, Model Tests, ML Infrastructure Tests, and Monitoring.
Monitoring
Monitoring is a crucial aspect of ensuring your machine learning models perform as expected in production. Continuous monitoring involves tracking key performance metrics such as precision, recall, and F1 score at regular intervals.
Continuous monitoring helps detect model drift, which occurs when data distributions change over time, causing the model's performance to degrade. By monitoring data quality, you can identify anomalies, missing values, or distribution shifts that could impact the model's performance.
Monitoring input data for anomalies is essential to prevent errors or performance issues. You can track the usage of system resources, such as CPU, memory, or storage, to ensure the infrastructure can support the deployed model without issues.
Alerts and notifications are vital for informing relevant stakeholders when predefined thresholds are crossed, signaling potential issues or the need for intervention. This can be implemented by tracking precision, recall, and F1 score of the model prediction along with time.
Here are some key aspects to monitor in your machine learning model:
- Dependency changes throughout the pipeline
- Data invariants in training and serving inputs
- Numerical stability of the ML model
- Computational performance of an ML system
- Staleness of the system in production
- Feature generation processes
- Predictive quality of the ML model
Azure Machine Learning publishes key events to Azure Event Grid, which can be used to notify and automate on events in the machine learning lifecycle. This allows for event-driven processes based on Azure Machine Learning events.
Reproducibility and Versioning
Reproducibility in machine learning means that every phase of the workflow should produce identical results given the same input. This is crucial for ensuring that our models are reliable and trustworthy.
To achieve reproducibility, we need to track our ML models and data sets with version control systems. This allows us to audit and reproduce our training processes. Every ML model specification should go through a code review phase and be versioned in a VCS.
We should also ensure that our data is reproducible, which means that the training data can be recreated exactly as it was during training. This can be achieved by backing up our data, saving snapshots of the data set, and using data versioning.
Here are some challenges and solutions for achieving reproducibility in different phases of the machine learning workflow:
By following these best practices, we can ensure that our machine learning pipelines are reproducible and versioned, which is essential for building reliable and trustworthy models.
Machine Learning Governance
Machine learning governance is crucial for ensuring the integrity and reliability of your models. Azure Machine Learning provides a robust metadata system to track the entire lifecycle of your machine learning assets.
You can track, profile, and version your data using Azure Machine Learning data assets. This helps you understand the origin and changes to your data over time.
Azure Machine Learning job history stores a snapshot of the code, data, and computes used to train a model. This provides a clear audit trail of your model's development.
Model interpretability is essential for meeting regulatory compliance and understanding how your models arrive at a result for a given input. Azure Machine Learning model registration captures all the metadata associated with your model.
Here's a summary of what you can track with Azure Machine Learning metadata:
- Data assets: track, profile, and version data
- Model interpretability: explain models, meet regulatory compliance
- Job history: store a snapshot of code, data, and computes used to train a model
- Model registration: capture metadata associated with a model
You can also use tags to add more information to your models and data assets. This allows you to filter and find specific models and data assets in your workspace more efficiently.
Automate Model Deployment
Automating model deployment is a crucial step in MLOps, as it streamlines the process of integrating trained machine learning models into production environments.
Consistency is key, and automated deployment processes help ensure that models are consistently deployed following predefined standards and best practices, reducing the risk of errors and inconsistencies that may arise from manual deployment.
Automating model deployment also shortens the time it takes to deploy a model from development to production, enabling businesses to benefit from the insights generated by the model more quickly.
This is particularly important in dynamic data environments or rapidly evolving business needs, where seamless updates are essential.
Automated deployment processes also allow for more frequent and seamless updates, ensuring that production models are always using the latest trained versions.
Here are the benefits of automated model deployment:
Automating model deployment is a critical component of MLOps, and it's essential to consider the infrastructure and data routing requirements for seamless deployment.
Challenges and Best Practices
Machine learning models often break in production due to changes in the environment or data. This is why actively monitoring model quality is crucial to detect performance degradation and model staleness.
To maintain model accuracy, you need to retrain your production models frequently. This involves updating your model with the most recent data to capture evolving patterns and emerging trends.
Retraining models regularly is essential, especially in applications like fashion product recommendations, where recommendations should adapt to the latest trends and products.
Continuous experimentation with new implementations is also vital to harness the latest ideas and advances in technology. This can involve trying out new techniques, such as feature engineering or model architecture, to improve detection accuracy.
To address the challenges of manual MLOps processes, practices like Continuous Integration/Continuous Deployment (CI/CD) and Continuous Testing (CT) are helpful. By deploying an ML training pipeline, you can enable CT and set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline.
To succeed with MLOps, consider the following best practices:
- Actively monitor the quality of your model in production.
- Frequently retrain your production models.
- Continuously experiment with new implementations.
Sources
- Hidden Technical Debt in Machine Learning Systems (nips.cc)
- Continuous delivery (CD) (wikipedia.org)
- Continuous integration (CI) (wikipedia.org)
- holdout test set (wikipedia.org)
- Why Machine Learning Models Crash and Burn in Production (forbes.com)
- concept drift (wikipedia.org)
- Apache 2.0 License (apache.org)
- What is MLOps? - A Gentle Introduction (run.ai)
- SIG MLOps (cd.foundation)
- Model Management Frameworks (inovex.de)
- Weights and Biases (wandb) (wandb.com)
- DVC (dvc.org)
- Figure source: “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by E.Breck et al. 2017 (googleusercontent.com)
- The Data Science Lifecycle Process Template (github.com)
- Cookiecutter Data Science Project Template (drivendata.github.io)
- Python SDK azure-ai-ml v2 (current) (aka.ms)
- Open Neural Network Exchange (onnx.ai)
- Machine Learning extension (visualstudio.com)
- Pratik D Sharma (pratikdsharma.com)
- [1] (pratikdsharma.com)
- AWS (amazon.com)
Featured Images: pexels.com