To get started with Huggingface on Google Cloud Platform (GCP), you'll need to create a project in the GCP Console. This will be the central hub for all your GCP resources.
First, navigate to the GCP Console and click on "Select a project" to create a new project. You can name your project whatever you like, but make sure it's descriptive and easy to remember.
Once your project is created, you'll need to enable the necessary APIs to use Huggingface on GCP. This includes the Cloud AI Platform API and the Google Cloud Storage API.
Next, you'll need to set up a Cloud Storage bucket to store your models and data. This is where you'll upload and download your files, so make sure it's easily accessible.
You can create a new Cloud Storage bucket by clicking on "Navigation menu" and selecting "Storage". Then, click on "Create bucket" and follow the prompts to set up your bucket.
Explore further: How to Create a Huggingface Dataset
After setting up your project and bucket, you'll need to install the Huggingface library and authenticate with GCP. This will allow you to use Huggingface with your GCP resources.
To install the Huggingface library, run `pip install transformers` in your terminal. Then, you'll need to authenticate with GCP using `gcloud auth application-default login`.
Setting Up Hugging Face on GCP
To set up Hugging Face on Google Cloud Platform (GCP), you'll need a Google Cloud account, the Google Cloud CLI installed and set up, and the necessary permissions to subscribe to offerings in the Google Cloud Marketplace and create IAM permissions and resources.
You can either select an existing GKE cluster or create a new one. If you want to create a new one, follow the instructions to define the necessary settings.
To deploy HUGS on Google Cloud GKE, you'll need to define the following settings: Namespace, App Instance Name, Hugs Model Id, GPU Number, GPU Type, and Reporting Service Account.
If this caught your attention, see: Create Feature for Dataset Huggingface
Here are the specific settings you'll need to define:
- Namespace: The namespace to deploy the HUGS container and model.
- App Instance Name: The name of the HUGS container.
- Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub.
- GPU Number: The number of GPUs you have available and want to use for the deployment.
- GPU Type: The type of GPU you have available inside your GKE cluster.
- Reporting Service Account: The service account to use for reporting.
Deployment takes around 10-15 minutes, and you can check the supported model matrix for specific deployment options, such as 1x NVIDIA L4 GPU for Meta Llama 3.1 8B Instruct.
Preparing the Environment
To prepare the environment for deploying HUGS on Google Cloud, you'll need to have a Google Cloud Account and the Google Cloud CLI installed and configured. Ensure you're logged in with the necessary permissions to subscribe to offerings in the Google Cloud Marketplace and create IAM permissions and resources.
You'll also need to create a GKE cluster with GPU support. To do this, you can follow the steps outlined in the official Google Kubernetes Engine documentation or create a GPU node pool in your existing cluster. Your GKE cluster with GPU support is now ready for HUGS deployment.
Here are the environment variables you'll need to set up for your cluster configuration: Namespace: The namespace to deploy the HUGS container and model.App Instance Name: The name of the HUGS container.Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub.
For more insights, see: Fastapi Huggingface Gpu
Deploy Hugs on GKE
To deploy HUGS on GKE, you need to have a Google Cloud account, the Google Cloud CLI installed and set up, and the necessary permissions to subscribe to offerings in the Google Cloud Marketplace.
Ensure you're logged in to your Google Cloud account with the required permissions.
To create a new GKE cluster for HUGS deployment, you'll need to define several parameters, including the namespace, app instance name, HUGS model ID, GPU number, GPU type, and reporting service account.
Here are the parameters you'll need to define:
- Namespace: The namespace to deploy the HUGS container and model.
- App Instance Name: The name of the HUGS container.
- Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub.
- GPU Number: The number of GPUs you have available and want to use for the deployment.
- GPU Type: The type of GPU you have available inside your GKE cluster.
- Reporting Service Account: The service account to use for reporting.
After defining these parameters, click on Deploy and wait for the deployment to finish, which takes around 10-15 minutes.
Upload Model to Google Cloud Storage
Uploading your model to Google Cloud Storage is an optional but recommended step to ensure reproducibility and avoid downloading files twice.
The bucket name must be lowercase and it will be the same name as the model, so make sure to choose a unique and descriptive name.
You can create a bucket like "my-hugging-face-models" and a folder like "opus-mt-es-en" to store your files.
This way, you'll be able to store your files in a cloud storage bucket and the script will download them from there if they're not already there.
By storing your files on the path specified in the previous section, you'll avoid downloading them twice, which can save you time and effort.
Create Model.env File
Creating a model.env file is a crucial step in preparing your environment. You'll need to write this file with the variables the script requires to build the image.
The script needs specific variables to function properly, and you can find an example of these variables in the repository's model_example.env file.
Authentication and Setup
To authenticate and set up your Hugging Face GCP project, you'll need to login to Google Cloud.
You'll need to authenticate in Google Cloud to run the script that makes the whole thing work.
The second command will configure your Docker configuration to upload the model to Google Cloud's registry.
If you created the artifact registry in a different region, update the second command to your own region.
There is also an instruction to set this up if you go into the repository you created in the artifact registry under setup instructions.
Run the Script
You can finally run the script after setting everything up. This is the case for other translation models such as opus-mt-de-en or opus-mt-nl-en.
If the .mar file was already built but you changed the handler.py because you made a mistake, you can use the --overwrite_mar flag. This flag allows you to overwrite the existing .mar file.
The script consists of several steps, which will be run per model, including downloading model files, building the .mar file, building the Dockerfile, and pushing it to the registry. Then, it will create the Vertex AI model stored in the Model Registry.
Explore further: How to Run Accelerate Huggingface
To create the .mar file, you use the following code, which includes the extra_files variable storing the relative path of all the files comma-separated except the pytorch_model.bin. This model file is included in a different argument when compiling the model.
You can add more models comma-separated if you followed the same steps and they use the same handler. If the model already exists, you upload a new version instead.
Worth a look: How to Use Huggingface Models in Python
Frequently Asked Questions
Is Hugging Face google?
No, Hugging Face is not Google, but they collaborate closely on open science, cloud, and hardware initiatives to bring the latest AI advancements to users. This partnership enables companies to build their own AI with the latest open models and cloud features.
Sources
- HUGS on Google Cloud (huggingface.co)
- Training HuggingFace GPT2 on Cloud TPU (TF 2.x) (google.com)
- deploy-hugging-face-model-to-gcp (github.com)
- Google and Hugging Face unveil AI partnership (techtarget.com)
- Google And Hugging Face Partner To Advance Generative ... (forbes.com)
Featured Images: pexels.com