Deploying Huggingface Models on GCP with Ease and Efficiency

Credit: pexels.com, Computer server in data center room

To get started with Huggingface on Google Cloud Platform (GCP), you'll need to create a project in the GCP Console. This will be the central hub for all your GCP resources.

First, navigate to the GCP Console and click on "Select a project" to create a new project. You can name your project whatever you like, but make sure it's descriptive and easy to remember.

Once your project is created, you'll need to enable the necessary APIs to use Huggingface on GCP. This includes the Cloud AI Platform API and the Google Cloud Storage API.

Next, you'll need to set up a Cloud Storage bucket to store your models and data. This is where you'll upload and download your files, so make sure it's easily accessible.

You can create a new Cloud Storage bucket by clicking on "Navigation menu" and selecting "Storage". Then, click on "Create bucket" and follow the prompts to set up your bucket.

Intriguing read: How to Create a Huggingface Dataset

Credit: youtube.com, Deploy Hugging Face models on Google Cloud: directly from Vertex AI

After setting up your project and bucket, you'll need to install the Huggingface library and authenticate with GCP. This will allow you to use Huggingface with your GCP resources.

To install the Huggingface library, run `pip install transformers` in your terminal. Then, you'll need to authenticate with GCP using `gcloud auth application-default login`.

Setting Up Hugging Face on GCP

To set up Hugging Face on Google Cloud Platform (GCP), you'll need a Google Cloud account, the Google Cloud CLI installed and set up, and the necessary permissions to subscribe to offerings in the Google Cloud Marketplace and create IAM permissions and resources.

You can either select an existing GKE cluster or create a new one. If you want to create a new one, follow the instructions to define the necessary settings.

To deploy HUGS on Google Cloud GKE, you'll need to define the following settings: Namespace, App Instance Name, Hugs Model Id, GPU Number, GPU Type, and Reporting Service Account.

Take a look at this: Create Feature for Dataset Huggingface

Credit: youtube.com, Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

Here are the specific settings you'll need to define:

Namespace: The namespace to deploy the HUGS container and model.
App Instance Name: The name of the HUGS container.
Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub.
GPU Number: The number of GPUs you have available and want to use for the deployment.
GPU Type: The type of GPU you have available inside your GKE cluster.
Reporting Service Account: The service account to use for reporting.

Deployment takes around 10-15 minutes, and you can check the supported model matrix for specific deployment options, such as 1x NVIDIA L4 GPU for Meta Llama 3.1 8B Instruct.

Preparing the Environment

To prepare the environment for deploying HUGS on Google Cloud, you'll need to have a Google Cloud Account and the Google Cloud CLI installed and configured. Ensure you're logged in with the necessary permissions to subscribe to offerings in the Google Cloud Marketplace and create IAM permissions and resources.

You'll also need to create a GKE cluster with GPU support. To do this, you can follow the steps outlined in the official Google Kubernetes Engine documentation or create a GPU node pool in your existing cluster. Your GKE cluster with GPU support is now ready for HUGS deployment.

Here are the environment variables you'll need to set up for your cluster configuration: Namespace: The namespace to deploy the HUGS container and model.App Instance Name: The name of the HUGS container.Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub.

Broaden your view: Fastapi Huggingface Gpu

Deploy Hugs on GKE

Credit: youtube.com, GKE Gemma 2 deployment with Hugging Face

To deploy HUGS on GKE, you need to have a Google Cloud account, the Google Cloud CLI installed and set up, and the necessary permissions to subscribe to offerings in the Google Cloud Marketplace.

Ensure you're logged in to your Google Cloud account with the required permissions.

To create a new GKE cluster for HUGS deployment, you'll need to define several parameters, including the namespace, app instance name, HUGS model ID, GPU number, GPU type, and reporting service account.

Here are the parameters you'll need to define:

Namespace: The namespace to deploy the HUGS container and model.
App Instance Name: The name of the HUGS container.
Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub.
GPU Number: The number of GPUs you have available and want to use for the deployment.
GPU Type: The type of GPU you have available inside your GKE cluster.
Reporting Service Account: The service account to use for reporting.

After defining these parameters, click on Deploy and wait for the deployment to finish, which takes around 10-15 minutes.

Upload Model to Google Cloud Storage

Uploading your model to Google Cloud Storage is an optional but recommended step to ensure reproducibility and avoid downloading files twice.

The bucket name must be lowercase and it will be the same name as the model, so make sure to choose a unique and descriptive name.

Credit: youtube.com, How to store data on Google Cloud

You can create a bucket like "my-hugging-face-models" and a folder like "opus-mt-es-en" to store your files.

This way, you'll be able to store your files in a cloud storage bucket and the script will download them from there if they're not already there.

By storing your files on the path specified in the previous section, you'll avoid downloading them twice, which can save you time and effort.

Create Model.env File

Creating a model.env file is a crucial step in preparing your environment. You'll need to write this file with the variables the script requires to build the image.

The script needs specific variables to function properly, and you can find an example of these variables in the repository's model_example.env file.

Authentication and Setup

To authenticate and set up your Hugging Face GCP project, you'll need to login to Google Cloud.

You'll need to authenticate in Google Cloud to run the script that makes the whole thing work.

For more insights, see: Google Cloud Skills Boost Generative Ai

Credit: youtube.com, Authenticate applications to Google Cloud

The second command will configure your Docker configuration to upload the model to Google Cloud's registry.

If you created the artifact registry in a different region, update the second command to your own region.

There is also an instruction to set this up if you go into the repository you created in the artifact registry under setup instructions.

Run the Script

You can finally run the script after setting everything up. This is the case for other translation models such as opus-mt-de-en or opus-mt-nl-en.

If the .mar file was already built but you changed the handler.py because you made a mistake, you can use the --overwrite_mar flag. This flag allows you to overwrite the existing .mar file.

The script consists of several steps, which will be run per model, including downloading model files, building the .mar file, building the Dockerfile, and pushing it to the registry. Then, it will create the Vertex AI model stored in the Model Registry.

For another approach, see: Hugging Face Run Locally

Credit: youtube.com, Deploy Python Applications From Source - Google Cloud Run

To create the .mar file, you use the following code, which includes the extra_files variable storing the relative path of all the files comma-separated except the pytorch_model.bin. This model file is included in a different argument when compiling the model.

You can add more models comma-separated if you followed the same steps and they use the same handler. If the model already exists, you upload a new version instead.

You might like: How to Use Huggingface Models in Python

Frequently Asked Questions

Is Hugging Face google?

No, Hugging Face is not Google, but they collaborate closely on open science, cloud, and hardware initiatives to bring the latest AI advancements to users. This partnership enables companies to build their own AI with the latest open models and cloud features.

Sources

Jay Matsuda

Lead Writer

View Jay's Profile

Jay Matsuda is an accomplished writer and blogger who has been sharing his insights and experiences with readers for over a decade. He has a talent for crafting engaging content that resonates with audiences, whether he's writing about travel, food, or personal growth. With a deep passion for exploring new places and meeting new people, Jay brings a unique perspective to everything he writes.

View Jay's Profile

Huggingface GCP Setup and Deployment Guide

Setting Up Hugging Face on GCP