Databricks and Hugging Face have partnered to make open-source AI development more accessible. This collaboration brings together the power of Databricks' unified data engineering and analytics platform with Hugging Face's Transformers library.
Databricks provides a scalable and secure environment for data engineering and analytics, while Hugging Face's Transformers library offers a wide range of pre-trained models and a simple interface for building and deploying AI models.
With this partnership, developers can now leverage the strengths of both platforms to build and deploy AI models more efficiently.
A fresh viewpoint: Feature Engineering Pipeline
Getting Started
Databricks is a fast, easy, and collaborative platform for data and AI teams. It's built on top of Apache Spark, which makes it a great choice for big data processing.
To get started with Databricks and Hugging Face, you'll need to create a Databricks account and log in to the Databricks workspace. This will give you access to the Databricks notebook interface, where you can write and run Python code.
Databricks is compatible with popular libraries like Hugging Face Transformers, which makes it easy to integrate with pre-trained models. You can use the Transformers library to load and use pre-trained models in your Databricks notebooks.
Hugging Face Transformers is a library of pre-trained models for natural language processing and computer vision tasks. It's a great resource for anyone looking to get started with AI and machine learning.
To get started with Hugging Face Transformers in Databricks, you'll need to install the library using pip. This will give you access to the pre-trained models and the ability to use them in your Databricks notebooks.
Recommended read: How to Use Huggingface Models
Using Hugging Face with Databricks
Using Hugging Face with Databricks is a game-changer for text processing at scale. You can use Pandas UDFs to distribute model computation on a Spark cluster, allowing you to perform computation on worker CPUs or GPUs.
Databricks recommends encapsulating a Hugging Face pipeline in a Pandas UDF to distribute inference on Spark. This makes it easy to use GPUs when available and allows batching of items sent to the GPU for better throughput.
Broaden your view: Pandas Confusion Matrix
The Hugging Face pipelines for translation return a list of Python dict objects, each with a single key translation_text and a value containing the translated text. You can extract the translation from the results to return a Pandas series with just the translated text.
To use the UDF to translate a text column, you can call the UDF in a select statement. This is a simple and efficient way to process text at scale on Databricks.
With the latest Hugging Face release, you can load a Spark dataframe into a Hugging Face dataset using the "from_spark" function. This makes it much simpler to accomplish the same task, saving time and cost.
Using Spark to load and transform data for training or fine-tuning a model, then mapping it into a Hugging Face dataset, combines cost savings and speed from Spark and optimizations like memory-mapping and smart caching from Hugging Face datasets. This can cut down processing time by more than 40% in some cases.
Recommended read: Huggingface Load Model from S3
Model Development
You can store a pre-trained model as an MLflow model, making it easier to deploy for batch or real-time inference. This allows model versioning through the Model Registry and simplifies model loading code for your inference workloads.
The first step is to create a custom model for your pipeline, which encapsulates loading the model, initializing the GPU usage, and inference function. The code closely parallels the code for creating and using a pandas_udf.
Hugging Face transformers pipelines make it easy to save the model to a local file on the driver, which is then passed into the log_model function for the MLflow pyfunc interfaces.
See what others are reading: Hidden Layers in Neural Networks Code Examples Tensorflow
Preparing Data for Download
To start working with your training data, you need to format it into a table that meets the expectations of the Trainer.
The table should have two columns: a text column and a column of labels. This is a standard setup for text classification tasks.
For another approach, see: Long Text Summarization Huggingface
You can use a DataFrame to store your data, and if you have string labels, you can collect this information using a pandas_udf to create an integer id column.
The model expects tokenized input, so you'll need to use the AutoTokenizer loaded from the base model to apply the tokenizer consistently to both the training and testing data.
Specifying a DBFS cache directory will allow you to efficiently download the dataset and reuse it in the future.
Related reading: How to Use Huggingface Models in Python
Integrating Spark Dataframes for Model Development
Traditionally, users had to write data into parquet files and then reload them using Hugging Face datasets. This method circumvents the efficiencies and parallelism inherent to Spark, making it cumbersome and time-consuming.
Spark dataframes were previously not supported by Hugging Face datasets, despite the platform's extensive range of supported input types. This limitation forced users to rely on inefficient methods, such as writing data to disk and then reloading it.
However, with the latest Hugging Face release, users can now use Spark to efficiently load and transform data for training or fine-tuning a model. This is achieved through the new "from_spark" function in Datasets, which allows users to directly integrate their Spark dataframes into Hugging Face datasets.
Using Spark to load and transform data can drastically reduce data processing time and costs. For example, a 16GB dataset that took 22 minutes to process using the traditional method can now be processed in just 12 minutes.
Here are some key benefits of using Spark dataframes for model development:
- Efficient data loading and transformation
- Reduced data processing time and costs
- Improved performance and scalability
By leveraging Spark dataframes and the "from_spark" function, users can streamline their model development process and focus on more complex tasks, such as fine-tuning and optimizing their models.
Batch Size
Batch Size is a crucial factor in model development. Databricks recommends trying various batch sizes for the pipeline on your cluster to find the best performance.
A batch size of 1 may not use the resources available to the workers efficiently. Choose a batch size that is large enough to drive the full GPU utilization without resulting in CUDAoutofmemory errors.
Monitor GPU performance by viewing the live cluster metrics for a cluster, and choosing a metric such as gpu0-util for GPU processor utilization or gpu0_mem_util for GPU memory utilization. This will help you identify the optimal batch size for your model and hardware.
Detaching and reattaching the notebook to release the memory used by the model and data in the GPU is necessary when receiving CUDAoutofmemory errors during batch size tuning.
Intriguing read: Learning with Errors Problem
Performance Optimization
Performance Optimization is crucial when working with Databricks and Hugging Face. To use each GPU effectively, you can adjust the batch size sent to the GPU by the Transformers pipeline.
Changing the batch size can significantly impact performance. For example, if you're using a GPU cluster, you can try batch sizes that are a multiple of the number of GPUs on your workers.
A unique perspective: Fastapi Huggingface Gpu
Making sure your DataFrame is well-partitioned can also help utilize the entire cluster. A good rule of thumb is to repartition your Spark DataFrame to use a multiple of the number of GPUs or cores across the workers.
Caching the Hugging Face model can save model load time or ingress costs. This is especially useful if you're working with large models or datasets.
To monitor GPU performance, you can view live metrics for a cluster, such as "Per-GPU utilization" or βPer-GPU memory utilization (%)β. This can help you identify areas for improvement and optimize your batch size accordingly.
Your goal with tuning the batch size is to set it large enough to drive full GPU utilization without causing "CUDA out of memory" errors.
Fine-Tuning and Inference
Fine-tuning your models on a single machine is a breeze with Hugging Face Transformers Trainer, which makes it easy to set up and perform model training on moderately sized datasets. You can fine-tune a pre-trained model on your own data to create a custom text classifier or spam classifier.
Additional reading: Llama 2 Fine Tuning Huggingface
To fine-tune a model, create a single machine cluster with GPU support, prepare and download your dataset to the driver, perform model training using Trainer, and log the resulting model to MLflow. This process is straightforward and efficient, allowing you to fine-tune your models without leaving Databricks.
For larger datasets, Databricks supports distributed multi-machine multi-GPU deep learning, giving you the flexibility to scale your model training as needed.
Check this out: Fine Tune Llama Huggingface
Fine-Tuning Transformers on a Single Machine
You can fine-tune pre-trained models on a single machine using the π€ Transformers Trainer utility. This is a great option for moderately sized datasets that can fit on a single machine with GPU support.
The Trainer utility makes it easy to set up and perform model training, so you don't need to leave Databricks to fine-tune your models.
For larger datasets, Databricks supports distributed multi-machine multi-GPU deep learning, but this is not necessary for moderately sized datasets.
Additional reading: Huggingface Fine Tuning Llm
To get started, create a single machine cluster with GPU support, which is a straightforward process.
Once your cluster is set up, you can prepare and download your dataset to the driver, which is the machine running the cluster.
After that, you can use the Trainer utility to perform model training, and then log the resulting model to MLflow for tracking and versioning.
Transformers Inference and MLflow Logging
Hugging Face Transformers inference is a great way to get started with text summarization quickly. You can use the Hugging Face Transformers pipelines inference and MLflow logging to create an end-to-end example.
To get started, you can load any logged or registered model into a spark UDF using MLflow. This provides an easy interface to look up a model URI from the Model Registry or logged experiment run UI.
You can store a pre-trained model as an MLflow model to make it easier to deploy a model for batch or real-time inference. This also allows model versioning through the Model Registry.
The first step is to create a custom model for your pipeline, which encapsulates loading the model, initializing the GPU usage, and inference function. This code closely parallels the code for creating and using a pandas_udf.
Hugging Face interfaces nicely with MLflow, automatically logging metrics during model training using the MLflowCallback. However, you must log the trained model yourself.
You can wrap training in an MLflow run, constructing a Transformers pipeline from the tokenizer and the trained model, and writes it to local disk. Finally, log the model to MLflow with mlflow.transformers.log_model.
Loading the model for inference is the same as loading the MLflow wrapped pre-trained model.
For more insights, see: Variational Inference with Normalizing Flows
Sources
- https://docs.databricks.com/ja/archive/machine-learning/train-model/model-inference-nlp.html
- https://www.databricks.com/blog/2023/02/06/getting-started-nlp-using-hugging-face-transformers-pipelines.html
- https://venturebeat.com/ai/databricks-and-hugging-face-integrate-apache-spark-for-faster-ai-model-building/
- https://docs.databricks.com/en/machine-learning/train-model/huggingface/fine-tune-model.html
- https://www.databricks.com/blog/contributing-spark-loader-for-hugging-face-datasets
Featured Images: pexels.com