Finetuning your deep learning models can make a huge difference in their performance. By adjusting the weights of a pre-trained model to fit your specific task, you can achieve better results without starting from scratch.
A good starting point is to choose a model that's already been trained on a related task. For example, if you're trying to classify images of dogs and cats, you might use a model that's already been trained on a large dataset of images. This can save you a lot of time and computational resources.
The key is to make targeted adjustments to the model's weights, rather than trying to retrain the entire model from scratch. This can be done using techniques such as transfer learning or knowledge distillation. By fine-tuning the model, you can adapt it to your specific task without losing the benefits of the original training.
By fine-tuning your deep learning models, you can unlock their full potential and achieve better results than you would with a model that's been trained from scratch.
A unique perspective: Tuning Hyperparameters
What is Fine-Tuning
Fine-tuning a model is a straightforward process that involves preparing the training data, starting the fine-tuning job with the required parameters, and monitoring the training job. This workflow is essential for producing a high-quality model.
To fine-tune a model, you'll need to prepare the training data, which is the first step in the process. Once you've got your data in order, you can start the fine-tuning job. The parameters you need to provide will depend on the specific task you're trying to accomplish.
The fine-tuning process involves three main steps: preparing the training data, starting the fine-tuning job, and monitoring the training job. Here's a quick rundown of what you can expect at each stage:
Why Fine-Tune
Fine-tuning is a crucial step in the machine learning process, and it's essential to understand why it's necessary. It's like adjusting the settings on your phone to get the best possible signal.
Fine-tuning allows a model to adapt to a specific task or dataset, making it more accurate and efficient. This is especially true for models that have been pre-trained on large datasets, but still need to learn the nuances of a particular task.
You might enjoy: How to Fine Tune a Model
By fine-tuning a model, you can significantly improve its performance on a specific task. For example, a model that's been pre-trained on images can be fine-tuned to recognize specific objects or scenes.
Fine-tuning also helps to reduce overfitting, which occurs when a model becomes too specialized to the training data and fails to generalize well to new data. This is a common problem in machine learning, and fine-tuning can help to mitigate it.
Fine-tuning can be done on a variety of tasks, including classification, regression, and object detection. It's a versatile technique that can be applied to many different types of models and datasets.
What is Fine-Tuning
Fine-tuning a model is a process that involves preparing the training data, starting the fine-tuning job with required parameters, and monitoring the training job. This workflow is crucial for getting the best results from your model.
To fine-tune a model, you'll need to prepare the training data first. This sets the stage for the fine-tuning process.
The next step is to start the fine-tuning job with the required parameters. This is where you specify the details of the job, such as the model architecture and hyperparameters.
Once the fine-tuning job is complete, you can use the model name provided by Cortex Fine-tuning to run inference on your model. This is a crucial step in putting your fine-tuned model to use.
Here's a high-level overview of the fine-tuning workflow:
- Prepare the training data.
- Start the fine-tuning job with the required parameters.
- Monitor training job.
Preparing Data
To fine-tune a model, you'll need to gather and prepare training data. This data must come from a Snowflake table or view and contain columns named prompt and completion.
Start with a few hundred examples, as too many can increase tuning time with minimal improvement in performance. For each example, use only a portion of the allotted context window for the base model you're tuning, defined in terms of tokens, which are approximately four characters of text.
The portion of the context window allocated for prompt and completion for each base model varies, as shown in the following table:
Save your structured data as either a JSONL file or a Parquet file (tokenized). The example packing strategy is used by default for training data if a JSONL file is provided, but you can disable it by providing a tokenized dataset in a Parquet file.
Fine-Tuning Process
To fine-tune a model, you need to prepare the training data first. This involves getting your data in order so it's ready for the fine-tuning process.
Once you have your data ready, you can start the fine-tuning job with the required parameters. This will kick off the process of adjusting the model to better fit your specific needs.
Monitoring the training job is the next step, where you keep an eye on how the fine-tuning is progressing. This is where you can catch any issues or problems that might arise.
After the training is complete, you can use the model name provided by Cortex Fine-tuning to run inference on your model. This is the final step, where you get to see the results of your fine-tuning efforts.
To create a fine-tuning job, you'll need to follow a few steps. First, you'll need to prepare your training data, then start the fine-tuning job with the required parameters.
Here's a step-by-step guide to starting a fine-tuning job:
- Prepare the training data.
- Start the fine-tuning job with the required parameters.
- Monitor the training job.
We support both LoRA and full finetuning, so you can choose the method that works best for you.
Fine-Tuning Options
You have several base models to choose from for fine-tuning, including llama3-8b, llama3-70b, llama3.1-8b, llama3.1-70b, mistral-7b, and mixtral-8x7b.
Each model has its own strengths and weaknesses, with some being ideal for tasks like text classification, summarization, and sentiment analysis, while others are better suited for chat applications, content creation, and enterprise use cases.
You can fine-tune a model by preparing the training data, starting the fine-tuning job with the required parameters, and monitoring the training job.
Here are the base models you can fine-tune:
Low-Rank Adaptation
Low-rank adaptation is an efficient technique for fine-tuning models, allowing for performance that approaches full-model fine-tuning with less space requirement.
A language model with billions of parameters can be LoRA fine-tuned with only several millions of parameters, making it a game-changer for large models.
The basic idea behind LoRA is to design a low-rank matrix that is then added to the original matrix. This approach enables the creation of adapters, which are collections of low-rank matrices that produce a fine-tuned model when added to a base model.
Support for LoRA was integrated into the Diffusers library from Hugging Face, making it easily accessible for users.
You might enjoy: Learn to Rank
Loss Masking
Loss masking is a technique used in fine-tuning to prevent the model from learning to predict parts of the prompt that aren't relevant to your task. This can be particularly useful when fine-tuning a model to answer a short question followed by a long context.
By providing a custom labels field for your examples in the tokenized dataset, you can mask out the loss calculation for specified tokens. Set the label for tokens you don’t want to include in the loss calculation to -100.
In some cases, penalizing the model's prediction for certain tokens can lead to ineffective training for your task. To avoid this, make sure to set the corresponding attention_mask to 1 for tokens you want the model to attend to during prediction.
Here's an interesting read: Elements to Statistical Learning
Fine-Tuned Models
You can fine-tune a model to suit your specific needs. Fine-tuned models are a result of a job that's finished, and you can see the fine-tuned model name via retrieved_jobs.fine_tuned_model.
There are several base models available for fine-tuning, including llama3-8b, llama3-70b, llama3.1-8b, llama3.1-70b, mistral-7b, and mixtral-8x7b. These models are ideal for tasks such as text classification, summarization, sentiment analysis, chat applications, content creation, and question answering.
Each fine-tuned model has its own strengths and weaknesses. For example, llama3-8b is ideal for tasks that require low to moderate reasoning with better accuracy than llama2-70b-chat model. On the other hand, mistral-7b is ideal for simple summarization, structuration, and question answering tasks that need to be done quickly.
You can use the COMPLETE LLM function with the name of your fine-tuned model to make inferences. This function is used for making inferences with the fine-tuned model.
Here are some of the available base models for fine-tuning:
Once your fine-tune job completes, you can see your new model in your models dashboard. You can either host it on Together AI for an hourly usage fee or download your model and run it locally.
Managing Fine-Tuned Models
Managing fine-tuned models is a crucial part of the finetune process. You can check the status of your tuning job using the SNOWFLAKE.CORTEX.FINETUNE function with ‘SHOW’ or ‘DESCRIBE’ as the first argument.
Fine-tuning jobs are long running, which means they are not tied to a worksheet session. You can use the SNOWFLAKE.CORTEX.FINETUNE function with ‘CANCEL’ as the first argument and the job ID as the second argument to terminate it if you no longer need a fine-tuning job.
To manage your fine-tuned models, you can use the following commands:
- SHOW: Displays the status of your fine-tuning job.
- DESCRIBE: Provides a detailed description of your fine-tuning job.
- CANCEL: Terminates your fine-tuning job.
Managing Fine-Tuned
Managing Fine-Tuned Models is a crucial step in the process. Fine-tuning jobs are long running, which means they are not tied to a worksheet session.
You can check the status of your tuning job using the SNOWFLAKE.CORTEX.FINETUNE function with 'SHOW' or 'DESCRIBE' as the first argument. This will give you an update on the progress of your fine-tuning job.
If you no longer need a fine-tuning job, you can use the SNOWFLAKE.CORTEX.FINETUNE function with 'CANCEL' as the first argument and the job ID as the second argument to terminate it.
Here's a summary of the available options for managing fine-tuned models:
Sharing
Sharing fine-tuned models is a great way to collaborate with others. Fine-tuned models can be shared to other accounts with the USAGE privilege via Data Sharing.
To share a model, you'll need to have the USAGE privilege, which allows you to access and use the model in another account.
Replicating
Replicating fine-tuned models can be a bit tricky, but don't worry, I've got you covered. You can't replicate models across regions, so inference must take place in the same region where the model object is located.
If you've fine-tuned a model in a specific region, like AWS US West 2, you can use database replication to copy the model to another region that supports the same model, such as AWS Europe West. This way, you can make inference from the replicated model.
For example, if you've fine-tuned a model based on mistral-7b in your account in the AWS US West 2 region, you can use database replication to replicate the model to another account in your organization in a different region.
To replicate objects, see Replicating databases and account objects across multiple accounts.
Recommended read: Solomonoff Induction
Evaluation and Monitoring
You can evaluate your fine-tuned model's performance using metrics like training loss, validation loss, and validation token accuracy. These metrics are essential indicators of the model's ability to generalize and make accurate predictions on new data.
The validation set is a held-out dataset used to evaluate the model's performance during training on unseen data. It can be created from the same data source as the training dataset or a mix of multiple data sources.
To use a validation set, you need to provide the --validation-file and --n-evals parameters, specifying the number of evaluations over the entire job. This will help you assess the model's ability to preserve its general capability while being fine-tuned for a specific task.
Here's a summary of the evaluation metrics:
You can monitor a fine-tuning job's progress by passing your Job ID to retrieve the latest details about your job directly from your code. The job will go through several phases, including Pending, Queued, Running, Uploading, and Completed.
Analyze and Evaluate
You can analyze and evaluate a fine-tuned model by retrieving its metrics every 10% of the progress with a minimum of 10 steps in between. This includes training loss, validation loss, and validation token accuracy.
Training loss indicates how well the model is learning from the training set, while validation loss provides insight into how well the model is generalizing to unseen data. Validation token accuracy is the percentage of tokens in the validation set that are correctly predicted by the model.
Both validation loss and validation token accuracy serve as essential indicators of the model's overall performance, helping to assess its ability to generalize and make accurate predictions on new data.
To evaluate a model, you can use a validation set, which is a held-out dataset to evaluate your model performance during training on unseen data. The validation set can be created from the same data source as the training dataset or a mix of multiple data sources.
For your interest: Ai and Machine Learning Training
Here are the key metrics you can expect to see when using a validation set:
- Validation loss: the error of the model on the validation data
- Validation token accuracy: the percentage of tokens in the validation set that are correctly predicted by the model
The evaluation is performed every n_evals training steps, and the final weights are evaluated on the validation set. The evaluation cost will be added to your final cost based on the size of your validation set and the number of evaluations.
Frequently Asked Questions
Is finetune one word or two?
Fine-tune is a single word, with 'fine' and 'tune' combined as a single unit
Does fine-tuning need a hyphen?
Fine-tuning is a compound adjective and does not require a hyphen when used as an adjective. However, when used as a verb, a hyphen is often used to separate the words.
Sources
- https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)
- https://docs.mistral.ai/guides/finetuning/
- https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-finetuning
- https://docs.together.ai/docs/fine-tuning-overview
- https://medium.com/@amanatulla1606/fine-tuning-the-model-what-why-and-how-e7fa52bc8ddf
Featured Images: pexels.com