huggingface load model from s3 with Amazon SageMaker

Author

Posted Nov 7, 2024

Reads 1.1K

Couple in a loving embrace, face to face in an open desert setting. Captures intimacy.
Credit: pexels.com, Couple in a loving embrace, face to face in an open desert setting. Captures intimacy.

To load a Hugging Face model from S3 with Amazon SageMaker, you'll need to create an S3 bucket to store your model files.

You can use the AWS CLI to create an S3 bucket.

In the AWS Management Console, navigate to the S3 dashboard and click "Create bucket".

The bucket name must be unique across all of Amazon S3.

Once your bucket is created, you can upload your model files to it.

To load the model from S3 with Amazon SageMaker, you'll need to create an execution role for your notebook instance.

This role gives your notebook instance the necessary permissions to access your S3 bucket.

The execution role must have the correct IAM policies attached to it.

The AmazonSageMakerFullAccess policy is a good starting point.

You can also attach additional policies as needed.

See what others are reading: Amazon Sagemaker Huggingface

Loading Model from S3

To load a model from S3, you can use the model_data property of the estimator. This property returns the S3 path to the model.

Credit: youtube.com, Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

After a successful training job, you can check if SageMaker has uploaded the model to S3 by looking at the model_data property. The model is stored as raw files in the S3 bucket.

The S3 bucket will have a similar folder structure and files, including the model weights and other relevant data. You can use this S3 path to load the model for further use.

Using Hugging Face Transformers

Using Hugging Face Transformers is a game-changer for many models, including those loaded from S3.

To use Hugging Face Transformers, you'll need to install the Transformers library, which can be done using pip with the command `pip install transformers`.

The Transformers library provides a simple way to load pre-trained models from the Hugging Face model hub, including models stored in S3.

You can load a model from S3 using the `from_pretrained` method, which takes the model ID as an argument.

For example, to load a model from S3, you can use the following code: `model = AutoModel.from_pretrained('model-id', use_auth_token=True)`.

Credit: youtube.com, Getting Started With Hugging Face in 15 Minutes | Transformers, Pipeline, Tokenizer, Models

The `use_auth_token` argument is required when loading models from S3, as it authenticates your request to access the model.

You can also specify additional arguments to the `from_pretrained` method to customize the loading process, such as `cache_dir` to specify a custom cache directory.

For instance, to load a model from S3 with a custom cache directory, you can use the following code: `model = AutoModel.from_pretrained('model-id', use_auth_token=True, cache_dir='/path/to/cache')`.

Loading Model into SageMaker

To load a model into SageMaker, you need to use the model_data property of the estimator. This property gives you the S3 path to the model.

The model is stored as raw files in the S3 bucket. You should see a similar folder structure and files in your S3 bucket.

The folder structure and files in the S3 bucket depend on the settings used during training. For example, if you used merge_weights=True and disable_output_compression=True, the model will be stored as raw files.

Credit: youtube.com, Deploy ML Models from S3 Buckets Using AWS SageMaker

Here are the S3 path properties you can use to load the model into SageMaker:

These properties are essential when loading a model into SageMaker. Make sure to use the correct property to access the model in the S3 bucket.

Fine-Tuning the Model

To fine-tune the model, you can use the recently introduced method QLoRA on Amazon SageMaker. QLoRA reduces the memory footprint of large language models during fine-tuning, without sacrificing performance.

The QLoRA technique involves quantizing the pre-trained model to 4 bits, freezing it, and attaching small, trainable adapter layers. You can use the run_qlora.py script to implement QLoRA using PEFT to train the model.

Fine-tuning the model with QLoRA takes about 3.9 hours on an ml.g5.4xlarge instance, which costs $2.03 per hour for on-demand usage. The total cost for training the fine-tuned model was only ~$8.

You can use the HuggingFace Estimator to create a SageMaker training job, which manages the infrastructure and takes care of starting and managing all the required EC2 instances for you. Make sure to include the requirements.txt in the source_dir if you are using a custom training script.

Credit: youtube.com, Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer

To start the training job, use the .fit() method and pass the S3 path to the training script. You can use the merge_weights parameter to merge the LoRA weights into the model weights after training.

The model is stored as raw files in the S3 bucket, and you can use the model_data property of the estimator to get the S3 path to the model. The folder structure and files in your S3 bucket should be similar to the following:

Note that using the g5.2xlarge instance type is not possible if you want to merge the LoRA weights into the model weights, since the model needs to fit into memory. However, you can save the adapter weights and merge them using the merge_adapter_weights.py script after training.

Keith Marchal

Senior Writer

Keith Marchal is a passionate writer who has been sharing his thoughts and experiences on his personal blog for more than a decade. He is known for his engaging storytelling style and insightful commentary on a wide range of topics, including travel, food, technology, and culture. With a keen eye for detail and a deep appreciation for the power of words, Keith's writing has captivated readers all around the world.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.