Huggingface Cache: A Guide to Efficient Model Deployment

Author

Posted Nov 17, 2024

Reads 332

Model
Credit: pexels.com, Model

The Hugging Face Cache is a game-changer for efficient model deployment. It's a caching mechanism that stores pre-computed results of expensive model computations, allowing you to reuse them instead of recalculating them every time.

By using the Hugging Face Cache, you can significantly reduce the time it takes to run your models, making it ideal for production environments. This is especially true for large-scale models that require a lot of computational resources.

The cache is particularly useful for tasks that involve multiple model calls, such as data preprocessing and feature engineering. By storing the results of these intermediate steps, you can avoid redundant computations and speed up your overall workflow.

With the Hugging Face Cache, you can also easily manage and monitor your cache files, making it easier to maintain a healthy cache and avoid cache thrashing.

Readers also liked: How to Use Hugging Face Models

Cache Management

Caching your Hugging Face model can save you a lot of time and money. You can cache it in the DBFS root volume or on a mount point by setting the TRANSFORMERS_CACHE environment variable before loading the pipeline.

Credit: youtube.com, The KV Cache: Memory Usage in Transformers

This can decrease ingress costs and reduce the time to load the model on a new or restarted cluster. You can also achieve similar results by logging the model to MLflow with the MLflow `transformers` flavor.

Here's a summary of the HFCacheInfo object that's returned when you scan your cache directory:

Cache the Model

Caching the model can save you a lot of time and resources, especially if you're working with large models or frequently loading them from different clusters.

If the file was not cached, the function huggingface_hub.try_to_load_from_cache will simply return None. Otherwise, it will return the exact path to the cached file or a special value _CACHED_NO_EXIST if the file doesn't exist at the given commit hash and this fact was cached.

The TRANSFORMERS_CACHE environment variable can be set to cache the model in the DBFS root volume or on a mount point. This can decrease ingress costs and reduce the time to load the model on a new or restarted cluster.

To achieve similar results, you can also log the model to MLflow with the MLflow transformers flavor.

Here are some options to cache the model:

  • Set the TRANSFORMERS_CACHE environment variable
  • Log the model to MLflow with the MLflow transformers flavor

Delete Revisions

Flat lay of various computer data storage devices on a gray surface.
Credit: pexels.com, Flat lay of various computer data storage devices on a gray surface.

You can delete one or more revisions cached locally using the delete_revisions function.

Input revisions can be any revision hash, and if a revision hash is not found in the local cache, a warning is thrown but no error is raised.

Revisions can be from different cached repos since hashes are unique across repos.

The delete_revisions function returns a DeleteCacheStrategy object that needs to be executed.

This object allows having a dry run before actually executing the deletion.

The DeleteCacheStrategy object is not meant to be modified.

Hub Scan Dir

The `huggingface_hub.scan_cache_dir` function is used to programmatically scan the cache-system. It scans the entire HF cache-system and returns a ~HFCacheInfo structure.

You can use this function to scan your cache-system repo by repo. If a repo is corrupted, a ~CorruptedCacheException will be thrown internally but captured and returned in the ~HFCacheInfo structure.

The cache will only be scanned for valid repos, which get a proper report.

Cache Information

Credit: youtube.com, Under the Hood | Episode 1: Step-by-Step Guide to Hugging Face and Generative AI

The HFCacheInfo object returned by scan_cache_dir() is a frozen data structure holding information about the entire cache-system, which is immutable.

This object includes three key pieces of information: the sum of all valid repo sizes in the cache-system, a set of CachedRepoInfo describing all valid cached repos found, and a list of CorruptedCacheException that occurred while scanning the cache.

You can access the sum of all valid repo sizes in the cache-system by looking at the size_on_disk attribute of the HFCacheInfo object, which is equal to the sum of all repo sizes (only blobs).

A table summarizing the HFCacheInfo object's attributes is shown below:

Info

The cache system is a powerful tool that helps you work more efficiently. You can scan the entire HF cache-system using the `scan_cache_dir` function.

The `scan_cache_dir` function takes a cache directory as input, which can be a string or a Path object. If the cache directory does not exist, a `CacheNotFound` exception is raised. On the other hand, if the cache directory is a file instead of a directory, a `ValueError` exception is raised.

Credit: youtube.com, Caching - Simply Explained

The `scan_cache_dir` function returns a `HFCacheInfo` object, which is a frozen data structure holding information about the entire cache-system. This object is immutable, meaning its contents cannot be modified once it's created.

Here's a summary of the information you can get from a `HFCacheInfo` object:

ClassHub.CachedRepoInfo

ClassHub.CachedRepoInfo is a data structure that holds information about a cached repository. It's a frozen set of data, which means it can't be changed once it's created.

The repo_id is a string that identifies the repository on the Hub, such as "google/fleurs". This is the unique identifier for the repository.

The repo_type is a string that indicates the type of the cached repo, which can be "dataset", "model", or "space". This helps us understand what kind of data is stored in the repository.

The repo_path is a Path object that points to the local path of the cached repo. This is where the data is actually stored on your computer.

Credit: youtube.com, What is Caching and How it Works | Caching Explained

The size_on_disk is an integer that represents the sum of the blob file sizes in the cached repo. This is not necessarily the sum of all revisions sizes, since duplicated files are not counted twice.

Here are the details of the ClassHub.CachedRepoInfo data structure:

  • repo_id (str)
  • repo_type (Literal["dataset", "model", "space"])
  • repo_path (Path)
  • size_on_disk (int)
  • nb_files (int)
  • revisions (FrozenSet[CachedRevisionInfo])
  • last_accessed (float)
  • last_modified (float)

Cache Options

Caching your Hugging Face model can save you a lot of time and money, especially if you're working with large datasets or frequently restarting your cluster.

You can cache your model in DBFS or on mount points by setting the TRANSFORMERS_CACHE environment variable in your code before loading the pipeline. This can significantly decrease ingress costs and reduce the time to load the model on a new or restarted cluster.

Alternatively, you can achieve similar results by logging the model to MLflow with the MLflow transformers flavor, which is another great way to save model load time.

Tune Performance

To use each GPU effectively, adjust the size of batches sent to the GPU by the Transformers pipeline.

Check this out: Fastapi Huggingface Gpu

Credit: youtube.com, How can caching improve performance and how do I test it?

You can also make sure the DataFrame is well-partitioned to utilize the entire cluster.

Finally, caching the Hugging Face model can save model load time and reduce ingress costs.

To cache the model, set the TRANSFORMERS_CACHE environment variable in your code before loading the pipeline.

This can be done by logging the model to MLflow with the MLflow `transformers` flavor, which achieves similar results.

By implementing these strategies, you can significantly improve the performance of your UDF.

DeleteStrategy

DeleteStrategy is a crucial component of cache management, and it's essential to understand how it works.

The DeleteStrategy object holds the strategy to delete cached revisions. It's not meant to be instantiated programmatically but rather returned by the delete_revisions() function.

This object contains a set of blob file paths to be deleted, which are stored in a FrozenSet called blobs.

The strategy also includes a set of reference file paths to be deleted, stored in a FrozenSet called refs.

Credit: youtube.com, Cache options for a view

Additionally, it contains a set of entire repo paths to be deleted, stored in a FrozenSet called repos.

Furthermore, it includes a set of snapshots to be deleted, which are directories of symlinks, stored in a FrozenSet called snapshots.

Here's a summary of the components of the DeleteStrategy object:

  • Expected freed size: a float representing the expected freed size once the strategy is executed.
  • Blobs: a FrozenSet of blob file paths to be deleted.
  • Refs: a FrozenSet of reference file paths to be deleted.
  • Repos: a FrozenSet of entire repo paths to be deleted.
  • Snapshots: a FrozenSet of snapshots to be deleted (directories of symlinks).

Frequently Asked Questions

Can I delete hugging face cache?

Yes, you can delete Hugging Face cache, either by using the Dataset.cleanup_cache_files() method or by manually removing the cache directory. Deleting the cache can help free up disk space, but be aware of the potential impact on your model's performance.

Landon Fanetti

Writer

Landon Fanetti is a prolific author with many years of experience writing blog posts. He has a keen interest in technology, finance, and politics, which are reflected in his writings. Landon's unique perspective on current events and his ability to communicate complex ideas in a simple manner make him a favorite among readers.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.