PyTorch offers a wide range of pretrained models, including ResNet, VGG, and DenseNet, that can be used for transfer learning.
These models have already been trained on large datasets such as ImageNet, which contains over 14 million images across 21,841 categories.
You can leverage these models to fine-tune them on your own dataset, saving you time and computational resources.
Pretrained models are particularly useful for image classification tasks, where they can be used as a starting point for your own model.
Consider reading: Generative Ai Models Are Statistical Models
Preparing the Dataset
To prepare a dataset for fine-tuning a pretrained model, you need to download a dataset and process it for training. This involves applying a preprocessing function over the entire dataset using 🤗 Datasets map method.
You can create a smaller subset of the full dataset to fine-tune on to reduce the time it takes. For example, the Yelp Reviews dataset can be used, which contains a large number of text reviews that can be processed using a tokenizer and padding/truncation strategy.
See what others are reading: Transfer Learning vs Fine Tuning
The Chessman image dataset from Kaggle is another example of a smaller dataset that can be used for fine-tuning. This dataset contains 556 images distributed over 6 classes, with a significant imbalance in the number of images per class.
Here's a breakdown of the classes and the number of images in each class:
- Bishop: 87
- King: 76
- Knight: 106
- Pawn: 107
- Queen: 78
- Rook: 102
To create the datasets for training, you'll need to split the dataset into training and validation sets. This can be done by creating subsets of the data, as shown in the code snippet below.
The dataset preparation involves defining constants for the data root directory path, validation split ratio, image size for resizing, batch size, and number of parallel processes for data preparation. This is typically done in a file called datasets.py.
For using a pretrained model from torchvision.models, you'll need to prepare a specific transform for your images. This can be done using the create_dataloaders function from the data_setup.py script, which calculates the means and standard deviations across a subset of images.
If this caught your attention, see: Elements in Statistical Learning
Training with PyTorch
Training with PyTorch can be a straightforward process. You can create a Trainer object with your model, training arguments, training and test datasets, and evaluation function, then fine-tune your model by calling train().
To train a model in native PyTorch, you can use the Trainer class, which takes care of the training loop and allows you to fine-tune a model in a single line of code. However, if you prefer to write your own training loop, you can manually postprocess the tokenized dataset to prepare it for training.
Here's a step-by-step guide to preparing the dataset for training:
1. Remove the text column because the model does not accept raw text as an input.
2. Rename the label column to labels because the model expects the argument to be named labels.
3. Set the format of the dataset to return PyTorch tensors instead of lists.
Once you've prepared the dataset, you can create a smaller subset of the dataset to speed up the fine-tuning process.
For your interest: What Is Human in the Loop
To keep track of your training progress, you can use the tqdm library to add a progress bar over the number of training steps. This will help you monitor your model's performance during training.
The training script is the final piece of the puzzle before you start training. You'll need to write the training script in the train.py file, which will include the imports and building of the argument parser. The argument parser will have flags such as --epochs, --pretrained, and --learning-rate, which you can use to control the training process.
Here's an overview of the training process:
- Scheduling the learning rate
- Saving the best model
PyTorch provides a range of features that make it easy to train models, including dynamic computational graphs, tensor computation, automatic differentiation, and neural network building blocks. With PyTorch, you can easily prototype and experiment with different models and architectures.
Here's a summary of PyTorch's features:
- Dynamic Computational Graph: PyTorch allows for automatic formulation of tasks while operations are done.
- Tensor Computation: PyTorch provides a powerful tool for tensor calculus, similar to NumPy libraries.
- Automatic Differentiation: PyTorch can calculate and handle gradients even with customized operations over tensors.
- Neural Network Building Blocks: PyTorch provides a range of functionalities to help with the development of neural networks.
- Dynamic Neural Networks: PyTorch enables the trainable network of neurons to change structure as it runs.
Using Pretrained Models
Pretrained models can significantly reduce training time and improve performance on deep learning tasks. You can find pretrained models in various places, including PyTorch domain libraries, HuggingFace Hub, timm (PyTorch Image Models) library, and Paperswithcode.
Worth a look: Generative Ai Types
The PyTorch domain libraries, such as torchvision, torchtext, and torchaudio, come with pretrained models of some form that work right within PyTorch. You can access these models by importing the relevant libraries and exploring their documentation.
HuggingFace Hub offers a series of pretrained models on many different domains, including vision, text, and audio, from organizations around the world. You can find plenty of different datasets too.
Some popular pretrained models include ResNet's, VGG, EfficientNet's, VisionTransformer (ViT's), and ConvNeXt. You can find these models in torchvision.models and use them as a starting point for your own tasks.
Here are some common architecture backbones and their corresponding code in torchvision.models:
By using a pretrained model, you can leverage a pre-trained model, significantly reducing the training time needed for a new task, and improving performance on the new task even with a limited amount of data specific to that task.
Implementing a Model
To start implementing a model, you'll need to choose a pre-trained model. In the case of the EfficientNet_B0 model, it's been trained on millions of images and has achieved ~77.7% accuracy across ImageNet's 1000 classes.
The pre-trained model can be used as a starting point for your own image classification task, such as classifying pizza, steak, and sushi images. To set up the EfficientNet_B0 model, you can use the torchvision.models.efficientnet_b0() function.
Here's a breakdown of the pre-trained model's architecture:
- Features: a collection of convolutional layers and other activation layers to learn a base representation of vision data.
- Avgpool: takes the average of the output of the features layer(s) and turns it into a feature vector.
- Classifier: turns the feature vector into a vector with the same dimensionality as the number of required output classes.
This pre-trained model has already been trained on a large dataset, so you can leverage its knowledge to improve your own model's performance.
Training Hyperparameters
You can create a TrainingArguments class to store hyperparameters and flags for training options. This class is where you can tune hyperparameters to find the optimal settings.
To save checkpoints from your training, you'll need to specify a location in the TrainingArguments class.
The Trainer object requires several inputs, including your model, training arguments, training and test datasets, and an evaluation function.
You can create a Trainer object by calling the Trainer class with these inputs, and then fine-tune your model by calling the train() method.
Worth a look: Create with Code Unity Learn
Train in Native
You can fine-tune a model in a single line of code using Trainer, which takes care of the training loop.
Trainer allows you to fine-tune a model in native PyTorch, giving you more control over the training process.
To manually postprocess tokenized_dataset, you'll need to remove the text column because the model doesn't accept raw text as an input.
Remove the text column by running the following command: tokenized_datasets = tokenized_datasets.remove_columns(["text"])
Rename the label column to labels because the model expects the argument to be named labels.
Rename the label column by running the following command: tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
Set the format of the dataset to return PyTorch tensors instead of lists.
Set the format of the dataset by running the following command: tokenized_datasets.set_format("torch")
Here's a quick summary of the steps to postprocess tokenized_dataset:
- Remove the text column: tokenized_datasets = tokenized_datasets.remove_columns(["text"])
- Rename the label column: tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
- Set the format to PyTorch tensors: tokenized_datasets.set_format("torch")
ConvNet as Feature Extractor
We can use a ConvNet as a feature extractor by freezing all the network except the final layer. This means we set requires_grad=False to freeze the parameters, so the gradients are not computed in backward(). On CPU, this will take about half the time compared to the previous scenario.
Freezing the parameters allows us to reuse the knowledge the ConvNet has learned from another problem, which is a key concept in transfer learning.
To freeze the parameters, we need to set requires_grad=False for the entire network except the final layer. This will significantly reduce the training time and computational resources required for the new task.
Here's a summary of the steps to use a ConvNet as a feature extractor:
- Freeze all the network except the final layer
- Set requires_grad=False to freeze the parameters
- Reuse the knowledge the ConvNet has learned from another problem
By using a ConvNet as a feature extractor, we can leverage the knowledge it has learned from another problem and apply it to our new task, significantly reducing the training time and computational resources required.
ResNet 50 Implementation
Implementing a ResNet 50 model involves several key steps.
First, you need to choose a pre-trained model, which in this case is ResNet-50. This model is used for image classification on the MNIST dataset.
The pre-trained ResNet-50 model is then combined with data augmentation techniques such as RandomResizedCrop(224), which randomly crops the image to 224x224 while retaining the aspect ratio.
RandomHorizontalFlip() is also used, flipping the image horizontally randomly with a 0.5 probability. Another data augmentation method is RandomRotation(10), which performs the maximum rotation of an image in random by 10 degrees.
The image is then converted to a PyTorch tensor using ToTensor(). Additionally, Grayscale(num_output_channels=3) is used to convert the image to grayscale while preventing degradation of image quality.
The ResNet 50 model is then trained and evaluated using the QMNIST dataset. The model is first preprocessed to convert images to tensors and normalize them.
The model is then deployed with the data of QMNIST to check whether it works properly. The model's accuracy is checked by comparing its labels with its predictions.
Here are the key steps in deploying the ResNet 50 model:
- Preprocess QMNIST images to convert them to tensors and normalize them.
- Take samples from QMNIS test dataset and DataLoader to evaluate model performances.
- Deploy the trained model with the data of QMNIST to check whether it works properly.
- Check the model's accuracy by comparing its labels with its predictions.
- Output the detail of the accuracy for the model on QMNIST dataset.
Executing the Training Script
Executing the training script is a crucial step in the PyTorch transfer learning process. It's essential to run the script twice, once without pretrained weights and again with them.
You'll need to set the learning rate accordingly, as using 0.001 without pretrained weights might be too slow to train or may not train at all. The training script will output the results, including the validation accuracy and loss.
As you monitor the training, you might see fluctuations in the validation metrics. In the last epoch without pretrained weights, the validation accuracy was 61.818% and the validation loss was 1.153.
On the other hand, when using pretrained weights, the final epoch's validation results are much better. You can expect a validation accuracy of more than 98% and a validation loss of 0.098.
Here's a comparison of the training times:
Keep in mind that the training times can vary depending on your system's specifications. It's also worth noting that using pretrained weights can significantly improve the results, as seen in the example where the validation accuracy increased to more than 98%.
Executing the Inference Script
To execute the inference script, you'll need to run the inference.py script. This script contains the code to run inference using the trained model. All the test images are located in the input/test_images directory.
The inference code loads the trained weights from the model checkpoint saved from training and fine-tuning the pretrained EfficientNetB0 model. The next code block iterates over all the test images and runs the inference on each one of them.
You'll see the output on the terminal screen, and also take a look at the results saved to disk. The model was able to correctly predict the King, Queen, and Knight, but not the other three classes.
Execute Inference.py Script
To execute the inference.py script, all you need to do is run it. The script is located in the same directory where the test images are stored, specifically in the input/test_images directory.
The script will run inference on all the test images and save the results to disk. You'll also see the output on the terminal screen.
The model is trained on a dataset of around 500 images, which is a relatively small amount of data. Despite this, the model is able to correctly predict the class of some images, such as the King, Queen, and Knight.
However, the model's performance is not perfect, and it makes mistakes on other images. This might seem like a bad performance, but it's actually not surprising given the limited amount of training data.
To improve the model's performance, you could try applying more data augmentation techniques or training the model for a few more epochs. You could also try using a larger EfficientNet model, such as EfficientNetB1.
Visualizing the Predictions
You can use the trained model to make predictions on custom images and visualize the predicted class labels along with the images.
The inference script can display predictions for a few images, making it easier to understand the model's output.
To visualize the predictions, you can use the generic function to display the predicted class labels and images side by side.
This function is particularly useful when working with a small dataset, allowing you to quickly assess the model's performance.
The predicted class labels will be displayed along with the images, giving you a clear understanding of the model's output.
By visualizing the predictions, you can identify any patterns or biases in the model's output and make adjustments as needed.
Evaluate
Evaluating your model's performance is a crucial step in the transfer learning process. You'll need to pass a function to the Trainer to compute and report metrics.
The 🤗 Evaluate library provides a simple accuracy function you can load with the evaluate.load function. This function will help you calculate the accuracy of your predictions.
Before passing your predictions to compute, you need to convert the logits to predictions. This is because all 🤗 Transformers models return logits.
To monitor your evaluation metrics during fine-tuning, specify the eval_strategy parameter in your training arguments. This will report the evaluation metric at the end of each epoch.
Evaluating your model can take significantly less time on GPU compared to CPU. On GPU, it takes less than a minute, whereas on CPU it takes around 15-25 minutes.
Important Concepts
In the world of PyTorch transfer learning, there are several important concepts to grasp. Pre-trained models are a great starting point, as they've already been trained on large datasets like ImageNet for vision tasks.
These pre-trained models can be used as a foundation for your own projects, saving you time and computational resources. Fine-tuning is a technique that allows you to re-train the pre-trained model with a new dataset, using a small learning rate to adapt it to your specific task.
The process of fine-tuning can be broken down into two approaches: fine-tuning and feature extraction. Fine-tuning involves re-training the entire model, while feature extraction uses the pre-trained model as a fixed feature extractor, only replacing the final classification layer during training.
Normalization is also a crucial step in the process, as it helps to speed up model training by normalizing the input data using mean subtraction and standard division. This can be achieved using a technique like Normalize(mean=[0.5], std=[0.5]).
For another approach, see: Machine Learning Hyperparameter
What Is Learning?
Learning is a fundamental concept in deep learning, and it's essential to understand how it works. Transfer learning is a technique that allows us to take the patterns learned by another model and apply them to our own problem.
A pre-trained model on a large dataset can be reused as a starting point for a new task, significantly reducing training time and improving performance. This approach is particularly useful when dealing with limited datasets.
Computer vision models can learn patterns on millions of images in datasets like ImageNet, and then use those patterns to infer on another problem. Language models can learn the structure of language by reading large amounts of text, like all of Wikipedia.
The premise of transfer learning is to find a well-performing existing model and apply it to our own problem, making it possible to leverage already trained models and adjust them to match new tasks.
Suggestion: How to Learn to Code on Your Own
Important Concepts
Pre-trained models are a crucial part of transfer learning. They're deep learning models that have been pre-trained on large datasets like ImageNet for vision tasks.
You can use these pre-trained models as a starting point for your own projects, saving you time and effort. Fine-tuning is another technique where you re-train the pre-trained model with a new dataset, but with a small learning rate.
Feature extraction is a key concept in transfer learning. You can use a pre-trained model as a fixed feature extractor, replacing only the final classification layer during training.
Normalizing input data is essential for efficient model training. By subtracting the mean and dividing by the standard deviation, you can speed up the training process.
Transformers are a vital data preprocessing stage in computer vision tasks. They help transform input data to a suitable form and scale, making it easier for models to process.
Frequently Asked Questions
Is transfer learning the same as fine-tuning?
Transfer learning and fine-tuning are related but distinct concepts in machine learning, with transfer learning capturing general patterns and fine-tuning adapting a model to a specific task. Fine-tuning builds upon transfer learning by further training the model on task-specific data.
What are the disadvantages of transfer learning?
Transfer learning can be limited by domain mismatches and overfitting, making it less effective for tasks with significantly different data distributions or requirements. Understanding these potential drawbacks can help you decide if transfer learning is the right approach for your project.
Sources
- https://huggingface.co/docs/transformers/en/training
- https://debuggercafe.com/transfer-learning-using-efficientnet-pytorch/
- https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
- https://www.learnpytorch.io/06_pytorch_transfer_learning/
- https://www.geeksforgeeks.org/how-to-implement-transfer-learning-in-pytorch/
Featured Images: pexels.com