To run Hugging Face locally on your machine, you'll first need to install the necessary packages. This includes the Transformers library and the Hugging Face CLI.
The Transformers library can be installed using pip, and it's recommended to use a virtual environment to keep your dependencies organized.
You can create a new virtual environment using conda, and then install the library using pip. The command to install the library is `pip install transformers`.
With the library installed, you can then install the Hugging Face CLI using pip. The command to install the CLI is `pip install transformers-cli`.
Set Up Environment
To set up your environment, you'll need to start by setting things up.
The first step is to set up your environment, which will allow you to use the Hugging Face hub programmatically.
You'll need to open a terminal or command prompt to get started.
To install the Hugging Face libraries, run the following command: This will install the core Hugging Face library along with its dependencies.
You should also install the datasets and the tokenizers library to have the full capability.
This will give you the full functionality of the Hugging Face library.
Running Locally
Running a powerful LLM locally is now feasible without expensive GPUs. Hugging Face offers an array of open-source models, which are benchmarked and presented on a leaderboard to help choose the best models available.
To run a LLM locally, you can use the Transformers library, which streamlines the process and allows for automatic model downloads. This library is ideal for experimentation and learning, but requires a solid understanding of ML and NLP, as well as coding and configuration skills.
Here are some options for running LLMs locally:
- Ollama: a powerful LLM that can be run locally using the command "ollama run zephyr-local"
- Hugging Face Transformers: a Python library that streamlines running a LLM locally, with automatic model downloads and code snippets available
- Llamafile: a user-friendly alternative for running LLMs, developed by Mozilla, which offers portability and the ability to create single-file executables
Run Directly
If you don't want to set up Large Language Models (LLMs) on your own machines, you can use Hugging Face's Transformer library to connect to these models, send requests, and receive outputs.
Hugging Face offers an overwhelming array of open-source models, and they regularly benchmark the models to help choose the best ones available.
You can use the Transformers library to run an older GPT-2 microsoft/DialoGPT-medium model, which will download the model automatically and allow you to have five interactions with it.
The script requires PyTorch to be installed, and it's ideal for experimentation and learning.
However, running the script requires a solid understanding of ML and NLP, and coding and configuration skills are necessary.
Here are some key benefits of using the Transformers library:
- Automatic model downloads
- Code snippets available
- Ideal for experimentation and learning
But be aware that it also has some limitations:
- Requires solid understanding of ML and NLP
- Coding and configuration skills are necessary
Alternatively, you can use Llamafile, a user-friendly alternative for running LLMs developed by Mozilla, which offers portability and the ability to create single-file executables.
Llamafile has some pros, including the same speed benefits as Llama.cpp and the ability to build a single executable file with the model embedded.
Turn on Checkpointing
Turn on checkpointing to save your model's progress and easily resume training later. You can store up to 100GB of models and datasets for free using Weights & Biases' Artifacts.
To log your Hugging Face model checkpoints to Artifacts, you need to set the WANDB_LOG_MODEL environment variable. You can choose from three options: 'checkpoint', 'end', or 'false'. If you choose 'checkpoint', a checkpoint will be uploaded every 'args.save_steps' from the TrainingArguments.
Here are the three options for WANDB_LOG_MODEL:
- checkpoint: a checkpoint will be uploaded every args.save_steps from the TrainingArguments.
- end: the model will be uploaded at the end of training.
- false: no model will be uploaded.
If you want to upload the best model at the end of training, use WANDB_LOG_MODEL along with load_best_model_at_end. This will save your model to W&B Artifacts as 'model-{run_id}' when WANDB_LOG_MODEL is set to 'end' or 'checkpoint-{run_id}' when WANDB_LOG_MODEL is set to 'checkpoint'. However, if you pass a run_name in your TrainingArguments, the model will be saved as 'model-{run_name}' or 'checkpoint-{run_name}'.
Training and Evaluation
Training and Evaluation is a crucial step in Hugging Face's Run Locally workflow.
To train a model, you'll need to create a dataset and a configuration file. The configuration file should specify the model architecture, optimizer, and other hyperparameters.
You can then use the `run` command to train the model, specifying the configuration file and dataset. For example, you might run `run --config config.json --dataset dataset.csv`.
The evaluation process involves assessing the performance of the trained model on a test dataset. You can use the `evaluate` command to evaluate the model, specifying the configuration file and test dataset. For example, you might run `evaluate --config config.json --dataset test_dataset.csv`.
Saving the Best
You can save the best model by setting load_best_model_at_end=True in the TrainingArguments. This will save the best performing model checkpoint to Artifacts.
If you want to centralize all your best model versions across your team, make sure you're saving your model checkpoints to Artifacts. Once logged to Artifacts, these checkpoints can be promoted to the Model Registry.
To save the best model, you can use WANDB_LOG_MODEL='checkpoint' and then resume training by using the model_dir as the model_name_or_path argument in your TrainingArguments and pass resume_from_checkpoint=True to Trainer.
Here's a summary of the best model saving options:
By using these options, you can easily save and manage your best models and make them available for further evaluation or deployment.
Visualize Evaluation Outputs During Training
Visualizing evaluation outputs during training is essential to understand how your model is training. This can be done by using the callbacks system in the Transformers Trainer.
You can log additional helpful data to W&B such as your model's text generation outputs or other predictions to W&B Tables. The Custom logging section below provides a full guide on how to log evaluation outputs while training to log to a W&B Table.
Logging to Weights & Biases via the Transformers Trainer is taken care of by the WandbCallback. However, if you need to customize your Hugging Face logging, you can modify this callback by subclassing WandbCallback and adding additional functionality.
The general pattern to add this new callback to the HF Trainer is by subclassing WandbCallback. This allows you to leverage additional methods from the Trainer class and add custom functionality to your logging.
Frequently Asked Questions
How to use Hugging Face models offline?
To use Hugging Face models offline, download your models ahead of time using `PreTrainedModel.from_pretrained()` and save them to a specified directory. Then, reload them when offline using the same `from_pretrained()` method.
Can you run Bert locally?
Yes, you can run a fine-tuned BERT model locally, and we've covered the process in our blog post. Learn how to set it up and run it on your local machine.
Sources
- https://docs.wandb.ai/guides/integrations/huggingface
- https://www.freecodecamp.org/news/get-started-with-hugging-face/
- https://semaphoreci.com/blog/local-llm
- https://www.mindfiretechnology.com/blog/archive/lm-studio-the-easiest-way-to-get-started-with-hugging-face-llms/
- https://otmaneboughaba.com/posts/local-llm-ollama-huggingface/
Featured Images: pexels.com