Huggingface provides a simple way to save models using the `save_pretrained` method, which can be accessed through the `HuggingFaceHub` class.
This method allows you to save models to a local directory or to the Hugging Face model hub.
Saving models to the Hugging Face model hub enables you to easily share and reuse models with others, and it also allows you to load models from the hub using the `from_pretrained` method.
You can specify a local directory to save the model to, or you can save it to the default directory specified by the `save_pretrained` method.
You might enjoy: Huggingface Local Llm
Saving the Model
Saving the Model is a crucial step in your Hugging Face journey. You can save your model using the `save_model` function.
This function will save the model, so you can reload it later using `from_pretrained()`.
On a similar theme: Velocity Model Prediciton Using Generative Ai
Model Configuration
When saving a model with Hugging Face's Save Model feature, you'll want to consider the model configuration. This includes the model's architecture, which can be a pre-trained model or a custom one.
The model architecture is defined by the user, and can include parameters such as the number of layers and the type of layers used. For example, a BERT model has 12 layers.
The model configuration also includes the tokenizer, which is responsible for converting text into numerical inputs for the model. This is crucial for the model to understand the input data.
ClassTransformers.TrainingArguments
The Class Transformers.TrainingArguments is a crucial part of configuring your model. It's a subset of arguments that relate to the training loop itself, allowing you to customize various aspects of the training process.
The TrainingArguments class has several key components, including output_dir, train, eval, predict, and eval_strategy. You can specify whether to overwrite the output directory, train, evaluate, or predict using boolean values.
You can also set the batch size per device, per GPU, and the number of accumulation steps. For example, you can set the per_device_train_batch_size to 8, per_device_eval_batch_size to 8, and accumulation_steps to 1.
A fresh viewpoint: Dataset Huggingface Modify Class Label
The delay and loss_only parameters are optional, allowing you to specify a delay in evaluating the model and whether to only evaluate the loss, respectively. The jit_mode parameter is also optional, enabling or disabling JIT mode.
In terms of training configuration, you can specify the number of steps, batch size, and accumulation steps. For instance, you can set the steps to 500, batch_size to 8, and accumulation_steps to 1.
The TrainingArguments class also includes parameters for evaluating the model, such as eval_strategy, eval_steps, and eval_delay. You can specify the evaluation strategy, number of evaluation steps, and delay in evaluating the model.
The Class Transformers.TrainingArguments provides a range of options for configuring the training loop, allowing you to fine-tune the training process to suit your specific needs.
If this caught your attention, see: Huggingface Training Service
Batch Size
Choosing a batch size is crucial for efficient model performance. Databricks recommends trying various batch sizes for the pipeline on your cluster to find the best performance.
Batch size should be large enough to drive full GPU utilization without resulting in CUDA out of memory errors. You can monitor GPU performance by viewing live cluster metrics for a cluster.
Detaching and reattaching the notebook can help release memory used by the model and data in the GPU when CUDA out of memory errors occur.
Recommended read: Fastapi Huggingface Gpu
Import Pretrained
Importing a pretrained model can save you a lot of time and effort.
Specifically, you can import a Hugging Face pretrained model, such as the BERT-base model for uncased text, from the transformers library.
This model is designed for text classification tasks, including sentiment classification, which typically has two labels.
To create your model, you'll need to set num_labels=2, as seen in the example.
Broaden your view: What Is the Classification of Chatgpt within Generative Ai Models
Convert to Composer
Converting a model to a ComposerModel is a crucial step in the model configuration process. This interface allows us to wrap a Hugging Face model in a ComposerModel.
To do this, we need to specify the Hugging Face model to wrap. The model parameter is required, so make sure to choose the right one.
The tokenizer parameter is also important, as it determines how the input data is created. We need to choose a Hugging Face tokenizer that matches our model.
We can also specify a list of torchmetrics to apply to the output of eval_forward. This helps us track our model's performance during validation.
Here are the key parameters to keep in mind when converting a model to a ComposerModel:
- model: The Hugging Face model to wrap.
- tokenizer: The Hugging Face tokenizer used to create the input data
- metrics: A list of torchmetrics to apply to the output of eval_forward.
- use_logits: A boolean which, if True, flags that the model’s output logits should be used to calculate validation metrics.
By carefully selecting these parameters, we can ensure a smooth conversion process and get the most out of our ComposerModel.
Cache Options
Caching your model can significantly reduce the time it takes to load on a new or restarted cluster, and lower your ingress costs.
To cache your model, you can set the TRANSFORMERS_CACHE environment variable in your code before loading the pipeline.
You can cache the model in the DBFS root volume or on a mount point, which can be a great option if you're frequently loading a model from different clusters.
Setting the TRANSFORMERS_CACHE environment variable before loading the pipeline is a simple and effective way to implement caching.
This can be achieved by specifying the path where the model should be cached, such as the DBFS root volume or a mount point, and then setting the TRANSFORMERS_CACHE environment variable to that path.
Curious to learn more? Check out: Latent Variable Models
Requirements
To configure your model, you'll need to meet some specific requirements.
MLflow 2.3 is the minimum version you'll need to use. This ensures you have the necessary tools to manage and deploy your model.
Any cluster with the Hugging Face transformers library installed can be used for batch inference. This library is a crucial component for many NLP tasks.
The transformers library comes preinstalled on Databricks Runtime 10.4 LTS ML and above. This makes it easy to get started with GPU-based inference.
For the best performance, consider using recent GPU hardware. This is especially important if you're working with models that aren't optimized for CPU use.
Here's a quick rundown of the hardware you'll need:
- MLflow 2.3
- Any cluster with the Hugging Face transformers library installed
- Recent GPU hardware (for best performance)
Using Pandas UDFs on a Spark Cluster
You can use Pandas UDFs to distribute model computation on a Spark cluster.
This approach allows you to wrap pre-trained models and perform computation on worker CPUs or GPUs, distributing the model to each worker.
Pandas UDFs can be used with Hugging Face Transformers pipelines for machine translation, running the pipeline on the workers of a Spark cluster.
Setting the device in this manner ensures that GPUs are used if they are available on the cluster.
The Hugging Face pipelines for translation return a list of Python dict objects, each with a single key translation_text and a value containing the translated text.
You can extract the translation from the results to return a Pandas series with just the translated text.
Spark automatically reassigns GPUs on the worker nodes if your cluster has instances with multiple GPUs, as long as your pipeline was constructed to use GPUs by setting device=0.
You can use the UDF to translate a text column by calling it in a select statement.
A unique perspective: How to Use Huggingface Model in Python
To Sanitized Dictionary
When working with model configurations, it's essential to use a sanitized dictionary for serialization. This ensures that sensitive information is not exposed.
A sanitized dictionary is specifically designed for use with TensorBoard's hparams. This allows for clean and organized display of model parameters.
To_sanitized_dict is a function that helps create a sanitized dictionary from a given dictionary. This is particularly useful when working with sensitive information.
Using a sanitized dictionary can help prevent exposure of sensitive data. It's a crucial step in maintaining the security and integrity of your model configurations.
A different take: How to Use Huggingface Models
Sources
- https://huggingface.co/docs/transformers/en/main_classes/trainer
- https://docs.databricks.com/ja/archive/machine-learning/train-model/model-inference-nlp.html
- https://docs.mosaicml.com/projects/composer/en/stable/examples/finetune_huggingface.html
- https://cloud.google.com/blog/products/ai-machine-learning/how-to-deploy-llama-3-2-1b-instruct-model-with-google-cloud-run
- https://huggingface.co/docs/hub/en/models-uploading
Featured Images: pexels.com