Unlocking Ollama Hugging Face Models for Your Project

Author

Posted Nov 8, 2024

Reads 1.1K

An artist’s illustration of artificial intelligence (AI). This image was inspired neural networks used in deep learning. It was created by Novoto Studio as part of the Visualising AI proje...
Credit: pexels.com, An artist’s illustration of artificial intelligence (AI). This image was inspired neural networks used in deep learning. It was created by Novoto Studio as part of the Visualising AI proje...

Ollama Hugging Face models are a type of model that can generate human-like faces and portraits.

These models are based on the DALL-E and DALL-E Mini architectures, which are known for their ability to generate realistic images from text prompts.

To use an Ollama Hugging Face model in your project, you'll need to have a basic understanding of how to work with Hugging Face's Transformers library.

This library provides a simple and efficient way to integrate these models into your code.

Getting Started

To get started with Ollama HuggingFace models, you'll need to know what topics we'll be covering in this article. We'll be discussing how to convert the fine-tuned StarCoder2-3B model to GGUF/GGML format, adding the custom model to local Ollama, uploading the model to the Ollama hub, and downloading the model for local testing.

To begin, you'll need to create a Modelfile, which will serve as the foundation for your model. The Modelfile is a crucial step in the process, and it's essential to get it right from the start.

Here's a quick rundown of the topics we'll be covering in this article:

  • Converting the StarCoder2-3B model to GGUF/GGML format
  • Adding the custom model to local Ollama
  • Uploading the model to the Ollama hub
  • Downloading the model for local testing

Build with Modelfile

Credit: youtube.com, Unlock Ollama's Modelfile | How to Upgrade your Model's Brain using the Modelfile

To build with Modelfile, start by creating a Modelfile as described in the previous step. This file is crucial for the build process.

You'll need to run the command `ollama create zephyr-local -f Modelfile` in your terminal, as shown in the example: `ollama create zephyr-local -f Modelfile`. This will begin the build process.

The next step is to list the created models using the `ollama list` command. This will give you an overview of the models you've created so far.

Here's a quick rundown of the build process:

Download from Hugging Face Hub

To download a model from Hugging Face, head over to the Hugging Face Hub and find the model you want to work with.

The article will be using Cognitive Computations' Laserxtral 4x7b model, but you can choose any model that suits your needs.

Click on the "Files and versions" tab to view the list of quantised versions of the model.

Credit: youtube.com, How to Download Models on Hugging Face 2024?

I'll be downloading the Q5_K_M version, but you should choose the one that fits your machine's specifications - if you have 16GB RAM, the Q3_K_M version is a good choice.

Click the file name of your chosen version and hit the download link, and then wait for the model weights to download - it's a big file, about 17GB.

Running the Model

Running the model is a straightforward process that can be completed in just a few steps. You can run the model using the command "ollama run zephyr-local".

To verify that the model respected the parameters in the Modelfile, you can run the model and see the output. For example, running the model might result in the following conversation:

>>>> What are you useful for?

As an AI language model, I can assist you in various ways such as:

1. Summarizing research papers: I can provide a concise summary or key points of a research paper based on your requirements.

Credit: youtube.com, Run Any Hugging Face Model with Ollama in Just Minutes!

2. Finding relevant information: If you're looking for specific information related to a particular topic, I can search through a large number of research papers and extract the required information for you.

3. Answering questions: You can ask me any question related to your research area, and I will provide an accurate and detailed answer based on my extensive knowledgebase.

4. Providing suggestions: If you're struggling with a specific research problem or need some ideas for your next study, I can suggest potential approaches or methods that might be helpful.

5. Editing and proofreading: I can help you to improve the clarity and coherence of your own writing by checking for grammatical errors, suggesting alternative phrasing, and ensuring that your arguments are logical and well-supported.

By running the model, you can see that it's capable of assisting with a wide range of tasks, from summarizing research papers to providing suggestions for research problems.

If you want to deploy any Hugging Face LLM model with Ollama on SaladCloud, you can pass the model as an environment variable during deployment. To do this, you'll need to specify the model from Hugging Face, including optional quantization settings, in the format of the environment variable MODEL.

Customization

Credit: youtube.com, Ollama: How To Create Custom Models From HuggingFace ( GGUF )

Customization is a key feature of ollama huggingface models, allowing users to fine-tune the models to suit their specific needs. This can be achieved by modifying the model's hyperparameters, such as the learning rate or batch size.

The models can also be customized by adding or removing specific layers or components, such as the attention mechanism or the embedding layer. This can be done by using the Hugging Face Transformers library's built-in tools, such as the `Trainer` class.

By customizing the model, users can improve its performance on specific tasks or datasets, and even adapt it to work with new or unseen data.

Custom Quantization

Custom Quantization is a feature that allows you to specify a custom quantization scheme for your Ollama model.

By default, Ollama uses the Q4_K_M quantization scheme if it's available in the model repository. You can manually select a quantization scheme by specifying it in the MODEL environment variable.

To find available quantization options, open the model's Hugging Face page and choose Ollama from the "Use this model" dropdown. From there, choose the quantization you want.

To specify a custom quantization, follow this format:

Converting to Gguf

Credit: youtube.com, How to Convert/Quantize Hugging Face Models to GGUF Format | Step-by-Step Guide

Converting to Gguf is a straightforward process that requires a few simple steps.

First, create a gguf_models directory in your working directory where the converted model will be stored.

To start, you'll need to use the llama.cpp script to convert the Hugging Face model.

This script is used to convert the Hugging Face model in the outputs_starcoder3b_4e to GGUF format.

The first command line argument is the path where your Hugging Face model and tokenizer files reside.

You'll also need to specify the --outfile argument, which is the file name where the new GGUF model should be saved.

After executing the convert_hf_to_gguf.py script, you can find the GGUF model in the gguf_models directory.

Deployment

Deploying Ollama Hugging Face models is a straightforward process. You can deploy Hugging Face models on Salad Cloud using Ollama by adding an environment variable named MODEL, set to the desired model.

To get started, you'll need to specify the model you want to deploy. Once you've done that, you can proceed with the deployment process.

You can add the MODEL environment variable in one of two ways.

A unique perspective: Huggingface Interview Process

Project Setup

Credit: youtube.com, Importing Open Source Models to Ollama

To get started with ollama and Hugging Face models, you'll need to set up a project. This involves creating a new directory for your project and installing the necessary dependencies, including the Transformers library and the ollama model.

The Transformers library provides a simple interface for loading and using pre-trained models, including the ollama model. You can install it using pip by running `pip install transformers`.

With the necessary dependencies installed, you can then create a new project directory and navigate to it in your terminal or command prompt. This will be the root directory for your project, where you'll store your code and data.

Steps to Add

To add models to Ollama, you need to follow these steps.

First, you need to convert the Hugging Face/Pytorch model to GGUF format using llama.cpp conversion scripts. This will allow you to run the model locally in the CLI.

Next, create an Ollama Modelfile and add the GGUF model to local Ollama. You'll need to define the context length, instruction, and stop parameters in the Modelfile.

An artist’s illustration of artificial intelligence (AI). This illustration depicts language models which generate text. It was created by Wes Cockx as part of the Visualising AI project l...
Credit: pexels.com, An artist’s illustration of artificial intelligence (AI). This illustration depicts language models which generate text. It was created by Wes Cockx as part of the Visualising AI project l...

Then, create a new model repository in your Ollama account and upload the model. This will make the model available for use in Ollama.

If you want to deploy the model quickly, you can use the pre-built recipe for deploying Llama3.1 with Ollama on SaladCloud. This will give you both Llama 3.1 and the model of your choice.

Here's a step-by-step guide to using the pre-built recipe:

1. Click Deploy a Container Group and choose the Ollama Llama 3.1 recipe.

2. Continue through the steps, selecting a higher-end GPU and other parameters for better performance.

3. On the final page, ensure Autostart is checked, then click Deploy.

Alternatively, you can create a custom container group using the following steps:

1. Click Deploy a Container Group and choose Custom Container Group.

2. Set a deployment name and edit the image source to enter saladtechnologies/ollama-hf:1.0.0 as the image name.

3. Edit the Environment Variables and add the necessary settings.

4. Select the desired CPU, RAM, GPU, storage, and priority for the deployment.

5. Add a Container Gateway, a Startup Probe, and ensure Autostart is checked, then click Deploy.

By following these steps, you'll be able to add models to Ollama and start using them in your projects.

Project Directory Structure

Credit: youtube.com, How To Structure A Programming Project…

Let's take a look at the project directory structure. You'll want to maintain a similar structure to use the same commands we'll execute further.

The project directory structure is divided into three main directories: llama.cpp, gguf_models, and outputs_starcoder3b_4e.

The llama.cpp directory is the one we cloned above. This directory contains the code for the LLaMA model.

The gguf_models directory contains the converted models and the Modelfile that we'll use to create local Ollama models.

Here's a quick rundown of the main directories:

  • llama.cpp: the cloned directory containing the code for the LLaMA model
  • gguf_models: contains converted models and the Modelfile for local Ollama model creation
  • outputs_starcoder3b_4e: contains fine-tuned Hugging Face weights and configuration files

This structure will help you stay organized and ensure you have all the necessary files for the project.

Frequently Asked Questions

Which is better, ollama or Hugging Face?

For local AI experimentation, Ollama offers a user-friendly CLI and API server, while Hugging Face provides a broader range of models. Ultimately, both options have their strengths, and the choice depends on your specific needs and goals.

What is the difference between ollama and ONNX?

Ollama is ideal for simple setups, while ONNX offers more performance and hardware-accelerated execution for demanding tasks. Choose Ollama for ease of use, or ONNX for power and speed.

Sources

  1. TheBloke (huggingface.co)
  2. Ollama (ollama.ai)
  3. zephyr-7b-beta.Q5_K_M.gguf (huggingface.co)
  4. zephyr-7b-beta (huggingface.co)
  5. hugging face CLI (huggingface.co)
  6. library (ollama.ai)
  7. Modelfile (github.com)
  8. library (ollama.com)
  9. Ollama (github.com)
  10. https://huggingface.co/ (huggingface.co)
  11. vLLM (github.com)
  12. llama.cpp (github.com)
  13. Model Scope (huggingface.co)
  14. HF models (huggingface.co)
  15. Laserxtral 4x7b (huggingface.co)
  16. Laserxtral 4x7b GGUF (huggingface.co)
  17. llama.cpp (github.com)
  18. StarCoder2-3B Instruct model on Ollama here (ollama.com)

Keith Marchal

Senior Writer

Keith Marchal is a passionate writer who has been sharing his thoughts and experiences on his personal blog for more than a decade. He is known for his engaging storytelling style and insightful commentary on a wide range of topics, including travel, food, technology, and culture. With a keen eye for detail and a deep appreciation for the power of words, Keith's writing has captivated readers all around the world.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.