Mlops Tutorial: A Complete Guide to Machine Learning Operations

Author

Reads 693

An artist’s illustration of artificial intelligence (AI). This image represents how machine learning is inspired by neuroscience and the human brain. It was created by Novoto Studio as par...
Credit: pexels.com, An artist’s illustration of artificial intelligence (AI). This image represents how machine learning is inspired by neuroscience and the human brain. It was created by Novoto Studio as par...

Machine learning operations, or MLOps, is the process of taking a machine learning model from development to production and deployment. It's a crucial step in making AI-powered applications a reality.

MLOps involves automating the entire lifecycle of a model, from training to testing to deployment. This includes tasks such as data preparation, model training, model evaluation, and model deployment.

To get started with MLOps, you'll need to understand the different stages of the MLOps pipeline, which include data ingestion, data processing, model training, model evaluation, model deployment, and model monitoring.

A unique perspective: Ai and Machine Learning Training

Data Preparation

Data Preparation is a crucial step in the MLOps process. Incomplete, inaccurate, and inconsistent data can lead to false conclusions and predictions.

Data cleaning and understanding is essential to ensure the quality of the results. This involves identifying and correcting errors, handling missing values, and transforming data into a suitable format for analysis.

Inaccurate data can have a significant impact on the model's performance. Incomplete data, on the other hand, can lead to biased results.

Understanding the data is just as important as cleaning it. This includes knowing the data types, distributions, and correlations between variables.

Feature Engineering and Management

Credit: youtube.com, What is feature engineering | Feature Engineering Tutorial Python # 1

Feature Engineering and Management is a crucial step in the MLOps process. It involves transforming raw data into features that can be used for model training and prediction.

One key aspect of feature engineering is handling categorical data, as seen in the ocean_proximity feature with 5 categories. To address this, One Hot Encoding is applied.

Feature engineering also involves checking for multicollinearity among features. Pearson's correlation revealed a few features with high multicollinearity, which was addressed by creating new features such as rooms_per_bedroom and population_per_house.

To ensure reproducibility, it's essential to store features in a centralized repository. The Feature Store in Databricks is used for this purpose, where the clean dataset was registered, keeping 20% separate for inference after model development.

Readers also liked: Python Feature Engineering

Feature Engineering

Feature Engineering is a crucial step in the machine learning pipeline. It involves transforming raw data into features that can be used to train a model.

One of the key aspects of feature engineering is handling categorical variables. In the case of the ocean_proximity feature, which has 5 categories, One Hot Encoding is applied. This is a common technique used to convert categorical variables into numerical variables.

For more insights, see: Feature Engineering Techniques

Credit: youtube.com, Using Feature Stores for Managing Feature Engineering in Python

To address multicollinearity, new features can be created by computing ratios of existing features. For example, rooms_per_bedroom and population_per_house were created by dividing total_rooms by total_bedrooms and population by households, respectively.

Here are some key considerations when creating new features:

  • Identify correlations: Use techniques like Pearson's correlation to identify correlations between features.
  • Create new features: Use ratios or other transformations to create new features that can help address multicollinearity.

By following these best practices, you can create a robust set of features that will help your model perform well.

Notebook

To get started with feature engineering and management, you should begin by exploring the Jupyter notebook. Click on the Jupyter icon at the top right corner of our Anyscale Workspace page to open up our JupyterLab instance in a new tab.

Navigate to the notebooks directory and open up the madewithml.ipynb notebook to interactively walkthrough the core machine learning workloads.

Recommended read: Open Source Mlops

Compute Configuration

Compute Configuration is a crucial aspect of Feature Engineering and Management. It determines what resources our workloads will be executed on.

We've already created a compute configuration for our workloads, but we can create one from scratch if needed. This involves defining the specifications for our computing environment.

Credit: youtube.com, Advanced Feature Engineering Tips and Tricks - Data Science Festival

The compute configuration is essentially a blueprint for our computing resources. We can customize it to suit our specific requirements and workload needs.

For instance, we can specify the type of hardware, the amount of memory, and the processing power required for our workloads. This ensures that our workloads are executed efficiently and effectively.

Machine Learning

Machine Learning is a subset of artificial intelligence that enables systems to learn from data without being explicitly programmed. This is achieved through algorithms that can improve their performance on a task over time.

One of the key characteristics of Machine Learning is its ability to handle high-dimensional data, such as images and text, which can be difficult to work with using traditional programming methods. According to the article, Machine Learning models can be trained using large datasets, allowing them to learn complex patterns and relationships.

In the context of MLOps, Machine Learning models are often deployed in production environments, where they can be used to make predictions or decisions in real-time. This requires careful consideration of issues such as model serving, model monitoring, and model maintenance.

Recommended read: Learning to Rank

Machine Learning Hyperparameter Tuning

Credit: youtube.com, Machine Learning Tutorial Python - 16: Hyper parameter Tuning (GridSearchCV)

Machine Learning Hyperparameter Tuning is a crucial step in developing an accurate model. It involves adjusting the model's parameters to optimize its performance.

To measure the accuracy of a LightGBM Regression model, the adjusted R-squared is used. This statistical measure represents the proportion of the variance for a dependent variable that’s explained by the independent variables.

Hyperparameter tuning can significantly improve the selected performance metrics. In the example, Bayesian Optimization using the hyperopt library was used to perform hyperparameter tuning.

The Root Mean Squared Error (RMSE) is another important metric used in regression tasks to measure the error of a model. It has the advantage of being expressed in the same units as the target variable.

For model training, features registered in the Feature Store are used to build the training dataset. The dataset is then split into training (70%) and test (30%) sets for modeling.

Hyperparameter tuning can lead to small improvements in performance metrics, but it's still a crucial step in developing an accurate model. In the example, the improvement of the adjusted R2 metric was from 0.83 to 0.84 after model tuning.

Recommended read: Hyperparameter Optimization

Machine Learning Operations

Credit: youtube.com, What is MLOps?

Machine Learning Operations (MLOps) refers to a set of practices to efficiently and reliably deploy and maintain ML models in the production environment.

MLOps began as only a set of best practices, but today it has evolved into a completely independent approach to the process of machine learning lifecycle management.

The goal of MLOps is to ensure that ML models are deployed and maintained in a way that is efficient, reliable, and scalable.

MLOps involves a range of activities, including model training, testing, deployment, and monitoring.

To implement MLOps, DevOps engineers, ML engineers, and Data Scientists work together to transition algorithms to production systems.

Key features of MLOps include the ability to configure scheduled runs or event-driven runs, set up continuous training pipelines, and maintain and store older runs.

Here are some key features of training operationalization:

  1. Capability to configure scheduled runs or event-driven runs that are initiated when new data is present and the model starts decaying.
  2. Setup continuous training pipelines with custom hyperparameter settings.
  3. Access to model registry to contain the ML artifact repository.

By implementing MLOps, organizations can ensure that their ML models are deployed and maintained in a way that is efficient, reliable, and scalable.

Deployment and Monitoring

Credit: youtube.com, MLOps explained | Machine Learning Essentials

Model deployment is a complex process that involves multiple components such as Continuous Integration (CI), Continuous Delivery (CD), online experimentation, and production deployment. This process is crucial for making the model available for use in the actual production environment.

In Databricks, you can visit the section of registered models to enable serving and start the serving process. The status will be visible on the top left, and this process will create a cluster for you where the current registered model will be deployed.

The key features of model deployment include continuous integration, continuous delivery, and different strategies of production deployment such as Canary deployment, Shadow deployment, and Blue/green deployment. These strategies help ensure that the model is properly tested and validated before being deployed to production.

Online experimentation such as Smoke testing, A/B testing, and MAB testing are carried out to test whether the new model is performing better than the older one or not. When a new model is considered for deployment in production, the old model also runs in parallel, and a subset of traffic is passed to the newer version later.

Credit: youtube.com, AWS Summit ANZ 2022 - End-to-end MLOps for architects (ARCH3)

Here are the different strategies of production deployment:

Model monitoring is an essential task after deployment to make sure the effectiveness of the deployed model remains. This involves analyzing the prediction schemas versus the ideal schemas and checking for anomalies.

The key features of model monitoring include providing security from the problems of data drift and concept drift, helping to analyze and improve the outputs of evaluation metrics such as memory utilization, resource utilization, latency, and throughput.

Discover more: Mlops Monitoring

Continuous Integration and Deployment

Continuous Integration and Deployment is a crucial step in the MLOps process. It ensures that changes to the code are verified automatically, which helps to prevent bugs and errors from making it into production.

In Databricks, you can use Github Actions to automate the CI/CD process. This involves triggering workflows upon a pull request to the main branch, which runs through three stages: DEV, TEST, and PROD. The DEV stage runs a workflow called 'eda_modeling' that does exploratory data analysis, modeling, and promoting of the best model to the Model Registry.

Credit: youtube.com, MLOps Tutorial#1. Continuous Integration (CI/CD) for ML Pipelines with Github Actions

The CI/CD process involves multiple components, including Continuous Integration, Continuous Delivery, and online experimentation. Continuous Integration deals with reading the source code and the model from the model registry to check the correctness of the input-output format from the model. Continuous Delivery involves three basic phases: deployment to staging, acceptance testing, deployment to production, followed by progressive delivery.

Here are the key components of the CI/CD process:

  • DEV: Runs a workflow called 'eda_modeling' that does exploratory data analysis, modeling, and promoting of the best model to the Model Registry.
  • TEST: Runs a workflow called 'job-model-tests' that includes the model tests for the transitions in the 'Staging' and 'Production' stages in the Model Registry.
  • PROD: Runs a workflow 'inference' for batch inference against new data.

By automating the CI/CD process, you can ensure that your ML models are deployed quickly and reliably, and that any errors or bugs are caught early on. This helps to improve the overall quality and performance of your models, and reduces the risk of errors or downtime in production.

A fresh viewpoint: Learning with Errors Problem

Ci/Cd

CI/CD is a crucial part of any software development process, and for MLOps projects, it's essential to automate the deployment of ML models in production.

CI/CD can be achieved using tools like GitHub Actions, which can trigger workflows upon a pull request to the main branch.

A different take: Mlops Ci Cd Pipeline

Credit: youtube.com, CI/CD Explained | How DevOps Use Pipelines for Automation

These workflows can include multiple stages, such as DEV, TEST, and PROD, each running a specific workflow. For example, the DEV stage can run a workflow called ‘eda_modeling’ that does exploratory data analysis, modeling, and promoting of the best model to the Model Registry.

Here's a breakdown of the different stages:

To set up CI/CD, you'll need to add credentials to the /settings/secrets/actions page of your GitHub repository. This includes a personal access token, which can be generated by following these steps: New GitHub personal access token → Add a name → Toggle repo and workflow → Click Generate token (scroll down) → Copy the token and paste it when prompted for your password.

Once you've set up your credentials, you can make changes to your code and push them to GitHub, which will trigger the workloads workflow. If the workflow succeeds, it will produce comments with the training and evaluation results directly on the pull request.

Versioning

Credit: youtube.com, What is Continuous Integration?

Versioning is a crucial aspect of Continuous Integration and Deployment. It helps keep track of different model versions created during the process.

Model versioning includes storing model files, source code, training settings, and data split information. This allows for easy analysis of performance across different versions.

Having multiple versions of a model is essential, especially when something breaks in the current system. You can go back to a previous stable version to fix the issue.

Key features of model versioning include:

  1. Model tracking and storage for different versions.
  2. Ease of accessibility and convenience to keep a check on the model versions with their parameter settings.
  3. Automatic creation of MLflow model object after each run.
  4. Provision of the complete project environment, including the conda.yaml and requirements.txt files.

Cluster

When setting up your cluster, you have several options to choose from. You can set up your cluster locally or via Anyscale, or use cloud providers like AWS and GCP, which have community-supported integrations.

To set up your cluster, you'll need to configure the environment and compute settings. You can also use a cluster environment that's already been created for you, like the one we used when setting up our Anyscale Workspace.

Credit: youtube.com, CI/CD In 5 Minutes | Is It Worth The Hassle: Crash Course System Design #2

You can create or update a cluster environment yourself, which determines where your workloads will be executed, including the operating system and dependencies.

Here are some options for creating a cluster:

  • On AWS and GCP
  • On Kubernetes via the KubeRay project
  • Deploy Ray manually on-prem or onto platforms not listed here

Each of these options has its own benefits and requirements, so be sure to choose the one that best fits your needs.

Anyscale Services

Once you've executed your ML workloads, you're ready to launch your model to production using Anyscale Services. This is where you'll serve your model to users.

To launch your service, make sure to change the $GITHUB_USERNAME in serve_model.yaml. This will allow you to save results from your workloads to S3 for retrieval later.

After updating the config, you're ready to launch your service.

Keith Marchal

Senior Writer

Keith Marchal is a passionate writer who has been sharing his thoughts and experiences on his personal blog for more than a decade. He is known for his engaging storytelling style and insightful commentary on a wide range of topics, including travel, food, technology, and culture. With a keen eye for detail and a deep appreciation for the power of words, Keith's writing has captivated readers all around the world.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.