Uploading your model to Hugging Face is a straightforward process that can be completed in just 8 steps.
First, create a Hugging Face account if you haven't already. This will give you access to their platform and allow you to upload your model.
To get started, you'll need to have a model that's ready to be uploaded. This means it should be in a format that Hugging Face supports, such as the transformers library.
Hugging Face supports a wide range of models, including those built with popular libraries like PyTorch and TensorFlow.
Preparing Your Model
To prepare your model for uploading, you'll need to fine-tune it on your specific task, either using the model directly in your own training loop or the Trainer/TFTrainer class. This will help you share the result on the model hub.
Your model needs to inherit from PyTorchModelHubMixin, which adds from_pretrained and push_to_hub capabilities, along with download metrics. This mixin class is available in the huggingface_hub Python library.
You might enjoy: Dataset Huggingface Modify Class Label
Automated download metrics are available for your model, just like for models integrated natively in the Transformers, Diffusers or Timm libraries. This means you'll be able to see how many times your model is downloaded, and the count will go up by one each time a user calls from_pretrained to load the config.json.
Here's what's stored on the Hub for each separate checkpoint:
- a pytorch_model.bin or model.safetensors file containing the weights
- a config.json file which is a serialized version of the model configuration
These two files are stored in a single repository, making it easy to track and manage your model's versions.
Prepare Your Model
To prepare your model for uploading, you'll first need to fine-tune it on a given task. This is a crucial step in getting your model ready for sharing on the model hub.
You can fine-tune your model using the Trainer/TFTrainer class, or by using the model directly in your own training loop. The key is to save the result of your training, so you can upload it later.
Here's an interesting read: Llama 2 Fine Tuning Huggingface
You'll want to save the folder containing the weights, tokenizer, and configuration of your model. This will make it easier to upload your model later.
Here are the essential files you'll need to save:
- weights (e.g. pytorch_model.bin or model.safetensors)
- tokenizer
- configuration (e.g. config.json)
Once you have these files saved, you're ready to upload your model to the model hub.
Check the Directory
Before you upload your model, take a few minutes to check the directory.
Make sure it's clean and only contains the necessary files.
A config.json file should be present, which stores the configuration of your model.
You'll also need a pytorch_model.bin file, unless you have a specific reason not to include it.
Another file you should check for is tf_model.h5, which is the TensorFlow checkpoint.
If you have a tokenizer save, you'll need to include special_tokens_map.json, tokenizer_config.json, and vocab.txt.
Added_tokens.json might also be part of your tokenizer save, but it's not required.
Here's a summary of the required files:
- config.json
- pytorch_model.bin
- tf_model.h5
- special_tokens_map.json
- tokenizer_config.json
- vocab.txt
Uploading to Hugging Face
To upload your model to Hugging Face, you'll need to create an account and link your repository to your account. This will give you versioning, branches, discoverability, and sharing features, as well as integration with dozens of libraries.
You can upload models to the Hub in several ways, including using the Repository class's push_to_hub() function, which adds files, makes a commit, and pushes them to a repository. Alternatively, you can use the huggingface-cli upload command from the terminal to directly upload files to the Hub.
The Repository class's push_to_hub() function requires you to pull from a repository first before calling it, but it's a great way to upload your model with a single function call. If you're uploading a large folder, you can use the upload_large_folder() method, which is resumable, multi-threaded, and resilient to errors.
Take a look at this: How to Use Huggingface Models in Python
Create Commit
If you want to work at a commit-level, use the create_commit() function directly. It supports two types of operations: CommitOperationAdd and CommitOperationDelete.
You can use CommitOperationAdd to upload a file to the Hub, but be aware that if the file already exists, the file contents will be overwritten. This operation accepts two arguments.
See what others are reading: How to Use Hugging Face Models
Here are the supported operations:
You can use the appropriate CommitOperation to add or delete a file and to delete a folder. The create_commit() function is used under the hood by other functions, such as upload_file() and upload_folder().
Push to Hub
Pushing files to the Hub is a straightforward process. You can use the Repository class's push_to_hub() function to add files, make a commit, and push them to a repository. Unlike the commit context manager, you'll need to pull from a repository first before calling push_to_hub().
You can initialize the repo from the local directory if you've already cloned a repository from the Hub. For example, if you've cloned a repository from the Hub, you can initialize the repo from the local directory.
To use push_to_hub(), you'll need to log in to your Hugging Face account. You can log in using the huggingface-cli login command or programmatically using the login() function in a notebook or script.
For your interest: Can I Generate Code Using Generative Ai
You can also use the push_to_hub() function to upload very large files with Git LFS. To do this, you'll need to create a Git LFS repository and then use the push_to_hub() function to upload your files.
Here are the different ways to push files to the Hub:
- Without using Git.
- That are very large with Git LFS.
- With the commit context manager.
- With the push_to_hub() function.
If you don't have Git installed on your system, you can use the create_commit() function to push your files to the Hub. This function uses the HTTP protocol to upload files to the Hub.
Remove from the CLI
To remove a model from the CLI, you can use the `huggingface-cli login` command with the `--remove` option. This will remove the model from your local cache.
You'll need to have already logged in to your Hugging Face account using the `huggingface-cli login` command.
The `--remove` option requires you to specify the model name, which you can find in the "Uploading to Hugging Face" process.
Customizing Your Upload
You can customize the way your data is uploaded to Hugging Face by creating a class that inherits from CommitScheduler and overwrites the push_to_hub method. This method will be called every minute in a background thread, so you don't have to worry about concurrency and errors.
You have access to the attributes of CommitScheduler, including the HfApi client, folder parameters, repo parameters, and the thread lock. This allows you to customize the upload process as needed.
To give you an idea of how this works, here's an example of how you can use this to zip all PNG files in a single archive to avoid overloading the repo on the Hub:
- HfApi client: api
- Folder parameters: folder_path and path_in_repo
- Repo parameters: repo_id, repo_type, revision
- The thread lock: lock
Custom
Customizing your upload can be a game-changer for how you share your files and work on the Hub. You can create a class inheriting from CommitScheduler and overwrite the push_to_hub method to customize the way data is uploaded.
You have access to the attributes of CommitScheduler, including the HfApi client, folder parameters, repo parameters, and the thread lock. This allows you to tailor your upload to your specific needs.
By overwriting push_to_hub, you can avoid overloading the repo on the Hub by zipping all PNG files in a single archive, as shown in the example below:
- zip all PNG files in a single archive
- avoid overloading the repo on the Hub
This example demonstrates how to customize the push_to_hub method to suit your needs. You can use this approach to create a more efficient and effective upload process.
Model Card
To add a model card, you need to create a README.md file in a specific location in the 🤗 Transformers repo.
You can place it in a subfolder with your username or organization, then another subfolder named after your model.
The model card should be placed in the model_cards/ directory, so make sure to navigate to that location.
You can also click on the “Create a model card on GitHub” button on the model page to get directly to the right location.
If you're fine-tuning a model from the model hub, don't forget to link to its model card, so people can see how your model was built.
You can also include a model card template, and feel free to add meta-suggestions to make it even better.
To make it easy, you can also place the model card in a README.md file inside the folder you uploaded with the CLI.
You might enjoy: How to Create a Huggingface Dataset
Using the Web Interface
To upload your model to the Hugging Face Hub using the web interface, start by navigating to the "Files and versions" tab and selecting "Add File". Specify "Upload File" and click Commit changes to upload your model.
You can also inspect files and history in this tab.
To ensure your repository is categorized with the TensorBoard tag, save your TensorBoard traces under the runs/ subfolder. This is a convention suggested by Hugging Face.
Models trained with 🤗 Transformers will generate TensorBoard traces by default if tensorboard is installed.
Sources
- huggingface_hub (github.com)
- Upload files to the Hub (huggingface.co)
- Upload files to the Hub (huggingface.co)
- Upload files to the Hub (huggingface.co)
- Model sharing and uploading — transformers 3.0.2 ... (huggingface.co)
Featured Images: pexels.com