Automl Vision on Google Cloud is a powerful tool that allows you to automate machine learning tasks, specifically image classification and object detection.
To get started, you'll need to create a dataset, which can be done by uploading images to the Google Cloud Console.
This dataset should include a balanced mix of images with and without the object of interest.
For example, if you're training a model to detect cars, your dataset should include images of cars and images without cars.
Once your dataset is ready, you can upload it to the Google Cloud Console and create a new AutoML Vision project.
This project will serve as the central hub for your AutoML Vision tasks, and you can manage it from the Google Cloud Console.
Preparation
To prepare your data for AutoML Vision, you can import it from your computer or Cloud Storage in CSV or JSON Lines format. This format should include labels and bounding boxes if necessary. You can also upload unlabeled images and apply annotations using the Google Cloud console, which can be managed in multiple annotation sets for the same set of images.
You can specify the splits in your CSV or JSON Lines import file if you want to split your dataset manually. The available data should be ready for training, without biases, missing, or erroneous values, as this affects the quality of the model.
You can upload a zip file containing training images in different folders, corresponding to the respective labels, or a CSV file with Google cloud storage (GS) filepaths, labels, and data partition for training, validation, and test set.
Assess Your Use Case
To assess your use case, start by identifying the outcome you want to achieve. This will help you determine the type of model you need to build.
Begin with the question: "What is the outcome you're trying to achieve?" This will guide your dataset preparation and model selection. For example, do you want to predict a binary outcome, such as whether a customer will buy a subscription or not?
Consider the kinds of categories or objects you need to recognize to achieve your outcome. If it's possible for humans to recognize those categories, then Vertex AI can likely handle them as well. However, if a human can't recognize a specific category, Vertex AI will have a hard time too.
Think about the kinds of examples that would best reflect the type and range of data your system will see and try to classify. This will help you create a dataset that accurately represents your use case.
Here's a breakdown of the different types of models you can use, depending on your outcome:
Choose the model type that best fits your outcome, and then select the appropriate model objective. For example, if you want to detect action moments in a video, use the action recognition objective.
Auto Hyperparameter Tuning
Auto Hyperparameter Tuning can be a game-changer for computer vision tasks, allowing you to automatically determine the best model architecture and hyperparameters for your dataset.
Predicting the best model architecture and hyperparameters can be a challenging task, especially when human time allocated to tuning hyperparameters is limited.
You can specify any number of trials, and the system will automatically determine the region of the hyperparameter space to sweep.
Launching automatic sweeps via the UI is not supported at this time, so you'll need to find another way to initiate this process.
Goals
In this tutorial, we'll be working with a labeled dataset that will be uploaded to Cloud Storage. We'll use a CSV file to link it to AutoML Vision for image classification.
We'll be training a model with AutoML Vision, which will allow us to evaluate its accuracy. This is a crucial step in the process.
We'll be using the created model to classify new images, which will be a key outcome of our tutorial.
Preparing the
Preparing the data for your AutoML Vision project is crucial for getting accurate results. You can import data from your computer or Cloud Storage in CSV or JSON Lines format, with labels and bounding boxes (if necessary) inline.
To ensure your data is ready for training, check for bias, missing, or erroneous values, as these can affect the quality of your model. You can upload unlabeled images and apply annotations using the Google Cloud console.
There are two ways to ingest data into AutoML: uploading a zip file with training images in different folders corresponding to labels, or uploading a CSV file with GS filepaths, labels, and data partition. If you choose the CSV file, you can define the data partition to control your experiment.
Here are the required columns for the CSV file:
You can also use a Python script to create images, export them, and upload them to GCP, as shown in Example 4.
Data and Annotation
Data and Annotation is a crucial step in automating machine learning with AutoML Vision. To get started, you'll need to prepare your data by adding bounding boxes and labels to your videos. This is especially important for object tracking, as it helps the model learn to identify patterns.
For object tracking, you'll need to draw bounding boxes around objects of interest in your example videos and assign labels like "person" and "ball." If your data hasn't been labeled yet, you can use the Google Cloud console to apply bounding boxes and labels.
A good rule of thumb is to include at least 100 image examples per category/label for classification. The more high-quality examples you provide, the better your model will be. In fact, targeting at least 1000 examples per label is a good starting point.
Here's a breakdown of the required fields for a computer vision task type:
Understanding Video Annotation Requirements
To annotate videos effectively, you need to draw bounding boxes around objects and assign labels to them, which can be time-consuming.
This process is essential for object tracking, where a Vertex AI model learns to identify patterns by looking at labeled examples.
For object detection, it's recommended to have at least 100 image examples per category/label for classification.
A good rule of thumb is to have an equal distribution of examples for each label, with a minimum of 10% of the examples for the label with the lowest number of examples being the same as the label with the highest number of examples.
However, this may not always be possible, especially when sourcing high-quality, unbiased examples for some categories is challenging.
In such cases, you can use data augmentation techniques to amplify the data size and variability of a dataset, which helps to prevent overfitting and improve the model's generalization ability on unseen data.
Data augmentation techniques include random resize and crop, horizontal flip, color jitter, and normalization using channel-wise ImageNet's mean and standard deviation.
Here's a summary of the data augmentation techniques applied for different computer vision tasks:
By following these guidelines, you can ensure that your video annotation requirements are met, and your model is trained on high-quality, diverse data.
Jsonl Schema Samples
For computer vision tasks, the structure of the TabularDataset is pretty straightforward. It includes three main fields: image_url, image_details, and label.
The image_url field contains the filepath as a StreamInfo object. This is where the image file is stored.
The image_details field provides metadata information about the image, such as its height, width, and format. However, this field is optional, so it may or may not exist.
The label field is a json representation of the image label, based on the task type. This is what the model will use to learn from the data.
Here are the details of the TabularDataset fields for computer vision tasks:
Model Configuration
Configuring experiments is a crucial step in automl vision, and there are three main workflows to choose from: individual trials, manual sweeps, and automatic sweeps. Automatic sweeps are a great starting point, as they can yield competitive results for many datasets without requiring advanced knowledge of model architectures.
Automatic sweeps also take into account hyperparameter correlations and work seamlessly across different hardware setups, making them a strong option for the early stage of your experimentation process.
To get the most out of automatic sweeps, you can define the parameter search space by specifying a single model architecture or multiple ones, depending on your needs. This will allow you to explore different models and hyperparameter configurations.
Here are some key things to keep in mind when defining the parameter search space:
- Supported model architectures vary by task type.
- Hyperparameters for computer vision tasks are specific to each task type.
- Distributions for discrete and continuous hyperparameters are supported.
By following these steps and considering the characteristics of automatic sweeps, you can set yourself up for success in the experimentation process and get closer to achieving your automl vision goals.
Define Search Space
Defining the search space is a crucial step in the model configuration process. You can specify a single model architecture or multiple ones to sweep in the parameter space.
To get started, you should know the list of supported model architectures for each task type. For individual trials, you can refer to the list of supported model architectures for each task type. Hyperparameters also play a significant role, and you should familiarize yourself with the hyperparameters for computer vision tasks for each computer vision task type.
You can define the model architectures and hyperparameters to sweep in the parameter space. This involves specifying a range of values for each hyperparameter.
The following table outlines the supported distributions for discrete and continuous hyperparameters:
You can use this information to define the search space for your model configuration process.
Select Your Type
Selecting the right task type is a crucial step in configuring your model. You have three options: image classification, object detection, and instance segmentation.
Each task type has its own specific syntax, which you can use to create an AutoML image job. For example, for image classification, you can use the `image_classification` task type, while for object detection, you can use `image_object_detection`.
Here are the specific syntax options for each task type:
By selecting the right task type, you can ensure that your model is trained for the specific task you need it to perform.
Configure Experiments
To configure experiments, you can launch individual trials, manual sweeps, or automatic sweeps. Automatic sweeps are a strong option for the early stage of your experimentation process, as they can yield competitive results for many datasets without requiring advanced knowledge of model architectures.
Automatic sweeps take into account hyperparameter correlations and work seamlessly across different hardware setups. This makes them a convenient choice for getting a first baseline model.
You can define the model architectures and hyperparameters to sweep in the parameter space. The parameter search space can be specified with a single model architecture or multiple ones.
Here are the supported model architectures for each task type:
To run an experiment, you need to provide a compute target. Automated ML models for computer vision tasks require GPU SKUs and support NC and ND families. A compute target with a multi-GPU VM SKU uses multiple GPUs to speed up training.
Multi-GPU and Multi-Node Training
Using multiple GPUs can significantly speed up model training, with the time to train a model decreasing in roughly linear proportion to the number of GPUs used.
For instance, a model should train roughly twice as fast on a VM with two GPUs as on a VM with one GPU. If you're training on a large dataset, using a VM with multiple GPUs can be a game-changer.
To take advantage of multi-GPU training, make sure to use a compute SKU that supports InfiniBand for best results. This will ensure that your model trains as efficiently as possible.
By default, each model trains on a single VM, but you can increase the number of VMs used to train each model by setting the node_count_per_trial property of the AutoML job. This can be done using task-specific automl functions, such as for object detection.
Create for Training
To create a dataset for training, you need to have a CSV file with a URL to a model training image and the label associated with that image. This CSV file should be copied to a Cloud Storage bucket and updated with the name of the bucket.
The CSV file should have at least 10 images in order to submit an AutoML job. You can create an MLTable from training data in JSONL format, or use helper scripts to convert data from formats like Pascal VOC or COCO.
A unique perspective: Genai Image
When creating a dataset, choose "Select a CSV file on Cloud Storage" and add the URL of the data CSV file. You can also use the Browse function to find the CSV file. The dataset should include at least 100 images for each category for an accurate model.
Here are some key considerations for creating a dataset:
- Use a CSV file with a URL to a model training image and the label associated with that image.
- Copy the CSV file to a Cloud Storage bucket and update with the name of the bucket.
- Have at least 10 images for an AutoML job.
- Choose "Select a CSV file on Cloud Storage" when creating a dataset.
- Use at least 100 images for each category for an accurate model.
Note: If you're working with a dataset that's not already labeled, AutoML Vision provides an internal manual labeling service.
Training and Evaluation
To train an AutoML Vision model, you need a minimum of 10 images in your training data. The data can be in JSONL format or converted from other formats like Pascal VOC or COCO using helper scripts.
You can create an MLTable from your training data in JSONL format, which is then used for model training. The training data needs to have at least 10 images to submit an AutoML job.
AutoML Vision handles all the model training automatically, without requiring any code from you. You can train the model by clicking on the Train tab, entering a name for the template, and selecting the Deploy model to 1 node after training option.
Training and Validation
To get started with training and validation, you'll need to bring labeled image data as input for model training in the form of an MLTable. This data can be created from training data in JSONL format, or you can use helper scripts to convert data from other formats like Pascal VOC or COCO.
You'll need at least 10 images in your training data to submit an AutoML job. This is a requirement for the job to be processed.
AutoML Vision can handle all of the model training automatically, without requiring you to write any lines of code. This makes it easy to get started with computer vision tasks.
Model training can take more than an hour to complete, so be patient and let AutoML do its thing. The process is fully automated, so you don't need to worry about a thing.
As you prepare your data for training, you'll want to make sure it's in the right format. This can be done using helper scripts or by creating an MLTable from your training data in JSONL format.
The training data needs to have at least 10 images in order to be able to submit an AutoML job. This is a requirement for the job to be processed.
AutoML Vision will recommend a minimum count of 100 labels per class, which may seem low to some. However, this is a good starting point for most computer vision tasks.
After importing your data, you'll receive an email from GCP informing you that the import of the dataset is completed. This is a helpful feature that lets you know when the import process is finished.
The import process can take a while, so be patient and let AutoML do its thing. You can close the browser window and let the process run in the background.
Primary Metric
In AutoML training jobs, a primary metric is used for model optimization and hyperparameter tuning. This metric is crucial in determining the performance of the model.
For image classification, the primary metric used is accuracy. This is the most straightforward way to measure how well the model is doing.
The primary metric for image classification multilabel is intersection over union. This is a more nuanced measure that takes into account the relationships between multiple labels.
In image object detection and instance segmentation, the primary metric used is mean average precision. This measure evaluates the model's ability to detect and classify objects accurately.
Here are the primary metrics for different task types:
Evaluation
Once the training is complete, it's time to evaluate the model. This is where you can see the Precision and Recall information for the model, as well as the confusion matrix.
The Evaluate tab provides a clear overview of the model's performance, including Precision and Recall information. You can also scroll down to see the confusion matrix.
AutoML models can converge at different rates, with the free model taking around 30 minutes to converge, while the paid model may take longer.
The overall model metrics of the free model look pretty decent, with an average precision of 96.4% on the testset and a recall of 87.7%. The paid model, on the other hand, achieved an average precision of 98.5% on the testset.
Detailed metrics such as precision / recall curves and classification cutoffs are also available in the Evaluate tab. You can also check images of false positives and negatives per class to understand why and when your model is doing something wrong.
The confusion matrix provides a clear visual representation of correct and misclassified examples, with relative frequencies of each.
Model Deployment
To deploy your model, you can use it in different ways depending on your use case, whether that means production-scale usage or a one-time prediction request.
You can register and deploy your model after the job completes, by specifying the AzureML path with the corresponding job ID or by downloading the model and changing the settings.json file before registering it.
Creating a deployment in the workspace is the next step, which can be done using the MLClient you created earlier, and this will start the deployment creation and return a confirmation response.
You can then use your model to make predictions, such as by uploading new images from your local machine using the AutoML user interface, or by using the API exposed by AutoML vision through a Python script or by curling the API in the command line.
Job Limits
Job Limits are a crucial aspect of AutoML Image training jobs. You can control the resources spent on your job by specifying the timeout_minutes, max_trials, and max_concurrent_trials in the limit settings.
To set the maximum number of trials to sweep, use the max_trials parameter. This must be an integer between 1 and 1000. If you're just exploring the default hyperparameters for a given model architecture, set this parameter to 1.
The default value for max_trials is 1. You can also specify the maximum number of trials that can run concurrently using the max_concurrent_trials parameter. This must be an integer between 1 and 100, and the default value is 1.
Here's a summary of the job limit parameters:
Remember, the number of concurrent trials is gated on the resources available in the specified compute target, so ensure that the compute target has the available resources for the desired concurrency.
Deploy Your
Deploying your model is the final step in the machine learning process. You can use your model in different ways depending on your use case.
Once you're satisfied with your model's performance, you can deploy it for production-scale usage or make one-time prediction requests. To deploy your model, you'll need to register and deploy it, which involves creating a deployment in your workspace.
To create a deployment, you'll use the MLClient created earlier. This will start the deployment creation and return a confirmation response while the deployment creation continues. You can also use the graphical interface to upload new images and generate predictions.
After fitting and evaluating your model, you can use several methods to predict new images. You can use the AutoML user interface to upload new images from your local machine or use the API exposed by AutoML Vision.
Intriguing read: Computer Vision Machine Learning
Here are the four possible prediction outcomes:
- True positive: The model correctly predicts the positive class.
- False positive: The model incorrectly predicts the positive class.
- True negative: The model correctly predicts the negative class.
- False negative: The model incorrectly predicts a negative class.
Note that the accuracy of your model will depend on the quality of your training data and the complexity of your model. It's essential to evaluate your model thoroughly before deploying it to ensure it performs well in different scenarios.
Model Interpretation
Model Interpretation is a crucial aspect of AutoML Vision, allowing users to gain insights into how their models make predictions. By leveraging Explainable AI (XAI), users can improve transparency in complex vision model predictions.
The deployed endpoint returns base64 encoded image strings if both model_explainability and visualizations are set to True. This allows users to decode and visualize the image strings in the prediction.
Four image sections are presented within a 2 x 2 grid. The top-left corner image is the cropped input image, while the top-right corner image is a heatmap of attributions on a color scale (blue, green, yellow, white) where white pixels contribute the most to the predicted class and blue pixels contribute the least.
Users can utilize the exact valid_resize_size and valid_crop_size values of the selected model to generate explanations. This is particularly useful for multi-class classification and multi-label classification.
The advantages of using XAI with AutoML for images are numerous. It improves transparency in complex vision model predictions, helps users understand important features/pixels in the input image, facilitates troubleshooting, and aids in discovering bias.
Here are some key points to keep in mind when working with model interpretation:
- Model Explainability is only supported for multi-class classification and multi-label classification.
- Attributions give users more control to generate custom visualizations or scrutinize pixel-level attribution scores.
- Users can utilize any library to generate visualizations, such as Captum visualization functionality.
Advanced Topics
AutomL Vision can handle complex tasks such as object detection and segmentation, which can be useful for applications like self-driving cars.
In object detection, AutomL Vision can identify and classify objects within an image, with a high accuracy rate of 95% when trained on a large dataset.
AutomL Vision can also handle tasks like image classification, where it can classify images into different categories, such as animals, vehicles, or buildings, with an accuracy rate of 92% when trained on a large dataset.
Moving Forward
Now that we've covered the basics, let's move forward with using AutoML Vision to achieve our desired outcome.
We've learned how to use AutoML Vision to train a machine learning model to recognise illegal amber mining, which is a crucial step in our project.
The rest of the course will focus on applying this knowledge to real-world problems, using AutoML Vision as our tool of choice.
We'll dive deeper into the specifics of training models to detect and classify objects, and explore the various applications of this technology.
By the end of this course, you'll be equipped with the skills and knowledge to tackle complex projects like ours, and make a real impact in the world.
Supported Architectures - HuggingFace and MMDetection
You can use a wide range of models from HuggingFace and MMDetection to suit your computer vision tasks. These models include image classification models like BEiT, ViT, DeiT, and SwinV2, as well as object detection and instance segmentation models like Sparse R-CNN, Deformable DETR, VFNet, YOLOF, and Swin.
To use these models, you can specify their string literal syntax, which can be found in the Azure Machine Learning registry. For example, you can use the model "microsoft/beit-base-patch16-224-pt22k-ft22k" for image classification.
The Azure Machine Learning registry also offers a list of curated models from HuggingFace and MMDetection, which have been thoroughly tested and use default hyperparameters selected from extensive benchmarking. These curated models include BEiT, ViT, DeiT, and SwinV2 for image classification, as well as Sparse R-CNN, Deformable DETR, VFNet, YOLOF, and Swin for object detection and instance segmentation.
Here is a list of some of the curated models available in the Azure Machine Learning registry:
You can get the most up-to-date list of curated models for a given task using the Python SDK.
Frequently Asked Questions
What is AutoML vision?
AutoML Vision is a tool that allows you to train machine learning models to classify images with minimal technical expertise. It enables you to define your own labels and automate the image classification process.
What is the difference between Google vision API and AutoML vision?
Google Vision API provides pre-trained machine learning models through APIs, while AutoML Vision automates the training of custom models for tailored solutions
What is the purpose of AutoML?
AutoML simplifies machine learning model development for non-experts through an intuitive interface. It enables easy training and deployment of models, making AI more accessible to a broader audience.
Sources
- https://cloud.google.com/vertex-ai/docs/beginner/beginners-guide
- https://newsinitiative.withgoogle.com/resources/trainings/hands-on-machine-learning/google-cloud-automl-vision/
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2
- https://flowygo.com/en/blog/automl-vision-image-classification/
- https://towardsdatascience.com/a-performance-benchmark-of-google-automl-vision-using-fashion-mnist-a9bf8fc1c74f
Featured Images: pexels.com