Automated Machine Learning (AutoML) is a type of machine learning that automates the process of creating and deploying machine learning models.
It uses algorithms to select the best model for a given task and dataset, making it easier and faster for developers to get started with machine learning.
AutoML can handle tasks such as data preparation, feature engineering, and model selection, which are often time-consuming and require expertise.
By automating these tasks, AutoML can save developers a significant amount of time and effort, allowing them to focus on higher-level tasks such as model interpretation and deployment.
AutoML can also provide better results than manual machine learning, as it can try many different models and techniques to find the best one for a particular problem.
Readers also liked: Bootstrap Method Machine Learning
What is AutoML
AutoML is a type of machine learning that automates the process of building and training models.
It uses algorithms to automatically select the most suitable model for a specific problem, eliminating the need for manual model selection and hyperparameter tuning.
AutoML can significantly reduce the time and effort required to develop and deploy machine learning models, making it a game-changer for businesses and organizations of all sizes.
By automating the model selection process, AutoML can also improve the accuracy and reliability of models, as it can try out multiple models and techniques to find the best fit.
AutoML has been shown to be particularly effective in tasks such as image and speech recognition, natural language processing, and predictive modeling.
In fact, a study cited in the article found that AutoML can reduce the time it takes to develop a model by up to 90%, freeing up resources for more complex and high-value tasks.
When to Use AutoML
AutoML is perfect for classification, regression, forecasting, computer vision, and NLP tasks. It democratizes the machine learning model development process, empowering users to identify an end-to-end machine learning pipeline for any problem.
You can use AutoML to implement ML solutions without extensive programming knowledge, save time and resources, apply data science best practices, and provide agile problem-solving. This is especially useful for small to medium-sized datasets, which can be trained quicker using AutoML compared to larger datasets.
A unique perspective: Optimize Machine Learning Algorithm
AutoML is particularly effective for structured data, such as when columns are clearly labeled and the data is well-formatted. It can easily handle missing values or skewness in the dataset, thanks to its ability to perform imputation and normalization.
Here are some scenarios where AutoML shines:
- Image classification
- Customer churn prediction
- Process automation
- Fraud detection
- Product & services personalization
- Digital marketing & advertising
- Anomaly detection
When to Use: Classification, Regression, Forecasting
AutoML is perfect for classification, regression, and forecasting tasks. It can train and tune a model for you using the target metric you specify, making it a great option for those who want to implement ML solutions without extensive programming knowledge.
AutoML performs well with structured data, which means it can easily handle missing values or skewness in the dataset. It's also a great choice for small to medium-sized datasets, as it can train them quicker compared to larger datasets.
For forecasting, AutoML can combine techniques and approaches to get a recommended, high-quality time-series forecast. It treats time-series forecasting as a multivariate regression problem, which allows it to naturally incorporate multiple contextual variables and their relationship to one another during training.
Here are some advanced forecasting configurations that AutoML supports:
AutoML can also save time and resources, and provide agile problem-solving capabilities, making it a great option for those who want to quickly assess a model and get a high-quality forecast.
Usage and Availability
AutoML is designed to make Machine Learning tasks easier and more accessible to non-experts. It can be used by Software Engineers to develop applications without knowing the details of ML algorithms.
AutoML offers different processes and techniques to make Machine Learning easily available. This makes it simple for non-Machine Learning experts to use.
AutoML can be used in various real-world challenges, such as Image classification and customer churn prediction. These are just a few examples of the many applications where AutoML can deliver value.
Data Scientists can build ML pipelines in a low-code environment using AutoML. This speeds up their work and makes it more efficient.
AutoML is available to AI Enthusiasts who want to explore its capabilities. This is a great way for them to learn more about Machine Learning and its applications.
AutoML Types
AutoML Types are a crucial aspect of the AutoML ecosystem.
There are several types of AutoML, including model-based AutoML and non-model-based AutoML.
Model-based AutoML uses machine learning algorithms to automatically design and train models, often with the help of human expertise.
Non-model-based AutoML, on the other hand, focuses on automating the process of feature engineering and data preparation.
These two types of AutoML can be used in various applications, including image classification, natural language processing, and time series forecasting.
How It Works
AutoML works by creating many pipelines in parallel that try different algorithms and parameters for you. It iterates through ML algorithms paired with feature selections, producing a model with a training score each time.
The better the score for the metric you want to optimize for, the better the model is considered to "fit" your data. AutoML stops once it hits the exit criteria defined in the experiment.
You can design and run your automated ML training experiments with these steps: Identify the ML problem to be solved, choose between a code-first experience or a no-code studio web experience, specify the source of the labeled training data, configure the automated machine learning parameters, submit the training job, and review the results.
AutoML frameworks begin by connecting to the provided dataset. The selected dataset should contain enough data to develop a supervised machine learning model for classification or regression, including the target variable and any other data used as features for the model.
Users also need to specify the target column as well when using an AutoML tool. The AutoML tool determines whether variables are numeric vs. categorical and counts missing values for each variable as part of the data profiling process.
AutoML tools experiment with multiple models and perform optimization, often starting with random sampling and then refining samples intelligently. A trained and optimized model can then be deployed in a production environment using Rest APIs.
Time-Series Forecasting
Time-Series Forecasting is a key application of AutoML, allowing you to build forecasts for business-critical metrics like revenue, inventory, and sales. You can use automated ML to combine techniques and approaches and get a recommended, high-quality time-series forecast.
Automated time-series experiments are treated as multivariate regression problems, which naturally incorporate multiple contextual variables and their relationship to one another during training. This approach has an advantage over classical time-series methods, as it allows for the estimation of model parameters using more data.
Advanced forecasting configuration in AutoML includes features like holiday detection and featurization, time-series and DNN learners, and many models that support grouping. You can also configure lags, rolling window aggregate features, and rolling-origin cross validation.
Here are some of the advanced forecasting algorithms supported by AutoML:
- Auto-ARIMA
- Prophet
- ForecastTCN
These algorithms can be used to build high-quality time-series forecasts, which can be a game-changer for businesses looking to make data-driven decisions. By using AutoML for time-series forecasting, you can save time and resources, and provide agile problem-solving for your organization.
Ensemble Models
Ensemble models are a crucial part of automated machine learning, and they're enabled by default. This means you can take advantage of their power without having to do any extra work.
Ensemble learning improves machine learning results and predictive performance by combining multiple models instead of using single models. The ensemble iterations appear as the final iterations of your job.
Automated machine learning uses both voting and stacking ensemble methods for combining models. Voting predicts based on the weighted average of predicted class probabilities or predicted regression targets.
Stacking combines heterogenous models and trains a meta-model based on the output from the individual models. The current default meta-models are LogisticRegression for classification tasks and ElasticNet for regression/forecasting tasks.
The Caruana ensemble selection algorithm with sorted ensemble initialization is used to decide which models to use within the ensemble. This algorithm initializes the ensemble with up to five models with the best individual scores.
If a new model improves the existing ensemble score, the ensemble is updated to include the new model. This process continues until no further improvements are made.
Here are the ensemble methods used by automated machine learning:
- Voting: Predicts based on the weighted average of predicted class probabilities (for classification tasks) or predicted regression targets (for regression tasks).
- Stacking: Combines heterogenous models and trains a meta-model based on the output from the individual models.
Tutorials/ How-Tos
If you're new to AutoML, tutorials are a great place to start. You can follow the "Tutorial: Train an object detection model with AutoML and Python" for a code-first experience, or "Tutorial: Train a classification model with no-code AutoML in Azure Machine Learning studio" for a low or no-code experience.
For more detailed instructions, how-to articles are available, covering topics like configuring settings for automatic training experiments and learning how to train computer vision models with Python.
If you're looking for specific tutorials, here are some examples:
- Tutorial: Train an object detection model with AutoML and Python
- Tutorial: Train a classification model with no-code AutoML in Azure Machine Learning studio
- Configuring settings for automatic training experiments
- Learning how to train computer vision models with Python
- Learning how to view the generated code from your automated ML models (SDK v1)
AutoML Features
AutoGluon allows you to customize featurization with techniques such as encoding and transforms.
You can enable this setting in the Azure Machine Learning studio by going to the View additional configuration section.
In the Python SDK, you can specify featurization in your AutoML Job object.
This is a useful feature that can help improve the accuracy of your models.
AutoGluon also supports various other use cases, including Image Prediction, Object Detection, Text Prediction, and Multimodal Prediction.
Each of these use cases can be handled with a simple 'fit()' command that automatically generates high-quality models.
Computer Vision
Computer vision is a powerful feature in AutoML that allows you to easily generate models trained on image data for scenarios like image classification and object detection.
With AutoML, you can seamlessly integrate with the Azure Machine Learning data labeling capability, use labeled data for generating image models, and optimize model performance by specifying the model algorithm and tuning the hyperparameters.
You can download or deploy the resulting model as a web service in Azure Machine Learning, and operationalize at scale, leveraging Azure Machine Learning MLOps and ML Pipelines capabilities.
AutoML for images supports the following computer vision tasks:
You can use the Azure Machine Learning Python SDK to author AutoML models for vision tasks, and access the resulting experimentation jobs, models, and outputs from the Azure Machine Learning studio UI.
For your interest: Azure Automl
Interface
The H2O AutoML interface is designed to be user-friendly and easy to navigate, with a focus on simplicity.
The interface has as few parameters as possible, allowing you to point to your dataset and identify the response column, with the option to specify a time constraint or limit on the number of total models trained.
In both the R and Python API, AutoML uses the same data-related arguments, such as x, y, training_frame, and validation_frame, which are also used by other H2O algorithms.
You can configure values for max_runtime_secs and/or max_models to set explicit time or number-of-model limits on your run, giving you more control over the process.
By specifying the data arguments, you can get started with AutoML quickly and easily, without needing to worry about a lot of complex parameters.
Scikit-Learn Compatibility
H2OAutoML can interact with the h2o.sklearn module, which exposes two wrappers for H2OAutoML: H2OAutoMLClassifier and H2OAutoMLRegressor.
These wrappers provide a standard API familiar to sklearn users, including fit, predict, fit_predict, score, get_params, and set_params.
Benefits of Using
Using AutoML can automate all time-consuming operations like algorithm selection, code writing, pipeline development, and hyperparameter tuning. This allows data scientists to focus on speeding up business challenges resolution.
AutoML models consider and select multiple machine learning algorithms from available ones, including random forest, k-Nearest Neighbor, SVMs, etc. This selection process is a significant part of the AutoML pipeline.
AutoML performs data preprocessing steps like missing value imputation, feature scaling, feature selection, etc. These steps are crucial for preparing data for machine learning models.
The AutoML framework optimizes or performs hyperparameter tuning for all models. This is a complex task that requires expertise in machine learning.
AutoML can be used by various professionals, including software engineers, data scientists, ML engineers, and AI enthusiasts. This makes machine learning tasks easier and more accessible.
Some real-world challenges where AutoML can deliver value include image classification, customer churn prediction, process automation, fraud detection, and product & services personalization.
Training
Training with H2O AutoML is a breeze. The h2o.automl() function in R and the H2OAutoML class in Python are the core components of the training process.
You can get started quickly by specifying the x argument, even though it's not always required. In fact, the default value of x is "all columns, excluding y", so you can produce the same result without specifying it.
Broaden your view: H2o Automl
AutoML training is designed to be fast and efficient, with the ability to configure explicit time or number-of-model limits on your run. This is achieved through the max_runtime_secs and/or max_models parameters.
With H2O AutoML, you can train models on small to medium size datasets quickly, making it ideal for a quick assessment of the model. However, larger and complex datasets may require more resources or time for multiple experiments for hyperparameter tuning and model optimization.
Model Deployment
Model deployment is a crucial step in automating machine learning.
Once a model is trained, it needs to be deployed to a production environment where it can be used to make predictions or classify data. This is typically done through a process called model serving, which involves packaging the model into a container that can be easily deployed to a cloud or on-premises server.
The goal of model deployment is to ensure that the model is running smoothly and efficiently, and that it can handle a high volume of requests without breaking down. This requires careful planning and testing to ensure that the model is scalable and can handle different types of data inputs.
ONNX
You can use Azure Machine Learning to convert a Python model to the ONNX format, which allows it to run on various platforms and devices.
The ONNX runtime supports C#, so you can use the model in your C# apps without recoding or network latencies.
With ONNX, you can run models on multiple platforms and devices, making it a great way to deploy your models.
To convert to ONNX format, check out the Jupyter notebook example in the article.
The ONNX runtime also supports inferencing ONNX models with the ONNX runtime C# API.
You can use an AutoML ONNX model in a .NET application with ML.NET.
Vertex AI
Vertex AI is a unified platform that helps you build, deploy, and scale more AI models. It's a one-stop-shop for all your AI needs.
You can use it to prepare and store your datasets, which is a crucial step in building accurate AI models. This means you can get started with your project right away, without having to worry about data collection.
With Vertex AI, you can also experiment and deploy more models, faster. This is because it provides access to the ML tools that power Google, which are some of the most advanced in the industry.
Industry solutions built on Vertex AI include reducing cost, increasing operational agility, and capturing new market opportunities. This means you can use AI to drive business growth and stay ahead of the competition.
Here are some key industry solutions that Vertex AI supports:
- Healthcare and Life Sciences: Advance research at scale and empower healthcare innovation.
- Manufacturing: Migration and AI tools to optimize the manufacturing value chain.
- Document AI: Document processing and data capture automated at scale.
- Vision AI: Custom and pre-trained models to detect emotion, text, and more.
By using Vertex AI, you can manage your models with confidence, knowing that they're built on a robust and scalable platform.
Prediction
Using the predict() function with AutoML generates predictions on the leader model from the run. The order of the rows in the results is the same as the order in which the data was loaded, even if some rows fail.
You can generate test set predictions by using the predict() function, as shown in the example code. This allows you to get predictions from the trained model.
AutoGluon's TabularPrediction can handle both classification and regression problems. For example, in the Stroke prediction dataset, AutoGluon correctly identified the type of problem as binary classification based on the two unique labels '0' & '1' in the outcome column.
The predictor can be set up to train for classifying whether an individual with a given set of conditions will probably be at risk of a stroke. This is done by specifying the outcome column as ‘stroke’ and asking the predictor to fit the algorithms on the train dataset.
AutoGluon trained 24 models for the Stroke prediction dataset, but we would be more interested to find out which is the best model as selected by AutoGluon. To display this, simply use the leaderboard() command which ranks the trained models in order.
The best model selected by AutoGluon can be used to make predictions on the test data. This can be done by feeding the test data to the classifier for prediction and storing it in a DataFrame.
AutoGluon's TabularPrediction also handles regression problems, such as the 'Boston prices' dataset from the sk-learn dataset library. In this case, AutoGluon correctly identified the type of problem as Regression based on the dtype=float for the column and the presence of multiple unique values.
For the regression problem, AutoGluon trained 11 models and recommended kNN (KNeighborsDist_BAG_L1) as the best model followed by XGBoost (XGBoost_BAG_L1).
Frequently Asked Questions
What is the difference between automation and ML?
Automation follows strict rules, while machine learning adapts and improves based on the data it processes. This key difference makes machine learning a powerful tool for complex tasks and continuous improvement
Will AutoML replace data scientists?
No, AutoML will not replace data scientists, but rather augment their work by automating routine tasks and freeing up time for more strategic and creative endeavors. AutoML is designed to assist, not replace, data scientists.
What is the difference between MLflow and AutoML?
MLflow offers more flexibility for customized model tuning, whereas AutoML provides a quicker start with a higher-level abstraction of the machine learning process. This difference affects the level of user expertise required and the degree of model customization possible.
What is the difference between MLOps and AutoML?
MLOps focuses on deploying and managing machine learning models in production, while AutoML streamlines the model creation and optimization process. Together, they bridge the gap between model development and real-world application.
What is the best AutoML tool?
There is no single "best" AutoML tool, as the choice depends on specific project needs and requirements. Popular AutoML tools include Dataiku, DataRobot, Google Cloud AutoML, H2O, Enhencer, MLJAR, Akkio, and JADBio AutoML, each with its own strengths and use cases.
Sources
- https://learn.microsoft.com/en-us/azure/machine-learning/concept-automated-ml?view=azureml-api-2
- https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
- https://cloud.google.com/automl/
- https://sagemaker.readthedocs.io/en/stable/api/training/automl.html
- https://www.analyticsvidhya.com/blog/2021/10/beginners-guide-to-automl-with-an-easy-autogluon-example/
Featured Images: pexels.com