huggingface sentiment analysis using Pre-Trained Models with Python

Author

Posted Nov 14, 2024

Reads 919

Serious ethnic psychotherapist listening to clients complains
Credit: pexels.com, Serious ethnic psychotherapist listening to clients complains

Using Hugging Face's pre-trained models with Python is a game-changer for sentiment analysis.

These models can be easily integrated into your code using the Transformers library.

One such model is the DistilBERT model, which has been fine-tuned for sentiment analysis.

It's a smaller version of the BERT model, making it more efficient for use in production environments.

You can use the `transformers` library to load the DistilBERT model and start analyzing sentiment in just a few lines of code.

For example, you can use the `Trainer` class to train the model on your own dataset, or use the `pipeline` function to get a pre-trained model.

The `pipeline` function is a convenient way to get started with sentiment analysis, as it allows you to easily switch between different models and tasks.

For instance, you can use the `sentiment-analysis` pipeline to analyze the sentiment of a piece of text, or the `text-classification` pipeline to classify text into different categories.

Data Preparation

Credit: youtube.com, How to Build a Sentiment Analysis App with Hugging Face

Data Preparation is a crucial step in any Hugging Face sentiment analysis project. You need to convert text to numbers, and for that, you'll use pre-trained models like BERT and DistilBERT.

The Transformers library is your go-to for pre-trained models, and it works with both TensorFlow and PyTorch. It also includes pre-built tokenizers that do the heavy lifting for you.

To get started, you'll need to load a pre-trained BertTokenizer, which can be done with a cased or uncased version. The cased version works better, as it can convey more sentiment with words like "BAD" versus "bad".

You can use the IMDB dataset for fine-tuning your model, but it's huge, so let's create smaller datasets for faster training and testing.

To preprocess your data, you'll use the DistilBERT tokenizer. This will convert your text inputs into the format required by the model.

Here's a quick rundown of the steps involved in data preparation:

By following these steps, you'll be well on your way to preparing your data for Hugging Face sentiment analysis.

BERT and Classification

Credit: youtube.com, Tutorial 1-Transformer And Bert Implementation With Huggingface

We can use BERT as a base model for classification tasks, and in this case, we'll use it for sentiment analysis.

You can load the BERT model using the basic BertModel, which is a good starting point for many tasks.

The last hidden state of the model is a sequence of 768 hidden units, which can be obtained by checking the config.

We can use the pooled output of the model as a summary of the content, but it's worth noting that this might not always be the best approach.

What Is BERT?

BERT stands for Bidirectional Encoder Representations from Transformers. This name is broken down into three key components.

Bidirectional means that to understand the text you're looking at, you'll have to look back at the previous words and forward at the next words. This is a departure from traditional models that read sequentially.

The Transformers model is non-directional and reads entire sequences of tokens at once. This allows for learning contextual relations between words, such as "his" referring to "Jim".

Credit: youtube.com, What is BERT and how does it work? | A Quick Review

BERT was trained by masking 15% of the tokens with the goal of guessing them. This is a key part of how BERT learns to understand the context of words.

Here are the three main ideas behind BERT:

  • Bidirectional: Understanding text by looking back and forward
  • Transformers: Reading entire sequences of tokens at once
  • (Pre-trained) contextualized word embeddings: Encoding words based on their meaning/context

The attention mechanism in BERT allows for learning contextual relations between words, making it a powerful tool for classification tasks.

BERT Classification

BERT Classification is a powerful tool for various tasks, including sentiment classification. It's a pre-trained language model that can be fine-tuned for specific tasks.

The BERT model is a basic BertModel that we can use as a starting point for our sentiment classifier. We can load the model and use it on the encoding of our sample text.

The last_hidden_state is a sequence of hidden states of the last layer of the model, which is a sequence of 32 tokens. This is a result of the model's architecture, which includes 768 hidden units in the feedforward-networks.

Credit: youtube.com, FineTuning BERT for Multi-Class Classification on custom Dataset | Transformer for NLP

We can verify the number of hidden units by checking the config, which is a crucial step in understanding the model's behavior. The config provides valuable insights into the model's architecture and parameters.

The pooled_output is a summary of the content, according to BERT, and it's obtained by applying the BertPooler on last_hidden_state. This output has a specific shape, which is essential for further processing and analysis.

To create a classifier that uses the BERT model, we can delegate most of the heavy lifting to the BertModel and add a dropout layer for regularization and a fully-connected layer for the output. The classifier should work like any other PyTorch model.

We can create an instance of the classifier and move it to the GPU, which is a common practice for improving model performance. Moving the example batch of our training data to the GPU is also a crucial step in preparing the data for training.

To get the predicted probabilities from our trained model, we'll apply the softmax function to the outputs, which is a common practice in classification tasks.

Training and Evaluation

Credit: youtube.com, HuggingFace Crash Course - Sentiment Analysis, Model Hub, Fine Tuning

Training and evaluation are crucial steps in sentiment analysis. Training involves fine-tuning pre-trained models like BERT and DistilBERT to suit your specific task. You can use the Hugging Face Trainer API to fine-tune these models.

To fine-tune a model, you'll need to define the training arguments and the metrics you want to evaluate. For sentiment analysis, accuracy and f1 score are common metrics. The Trainer API will take care of the rest, including hyperparameter tuning and model deployment.

Here are some recommended hyperparameters for fine-tuning BERT: batch size (16, 32), learning rate (5e-5, 3e-5, 2e-5), and number of epochs (2, 3, 4).

Training

Training a model can be a complex task, but don't worry, we've got some recommendations to get you started. The BERT authors suggest using a linear scheduler with no warmup steps and the AdamW optimizer to reproduce the training procedure.

For hyperparameter tuning, batch size is a crucial factor: 16 or 32 are recommended options. Learning rate can be set to 5e-5, 3e-5, or 2e-5, and the number of epochs can be 2, 3, or 4. Note that increasing batch size significantly reduces training time but may give you lower accuracy.

Credit: youtube.com, How to Effectively Evaluate Your Training Programs

To avoid exploding gradients, you can clip the gradients of the model using clipgrad_norm. This technique is especially useful when dealing with large models like BERT.

Here are some common hyperparameter settings for fine-tuning BERT:

These settings can be a good starting point, but feel free to experiment and adjust them to suit your specific needs.

Evaluation

Our model's accuracy on the test data is about 1% lower than expected, which suggests it generalizes well.

The model has difficulty classifying neutral reviews, which is a common challenge in sentiment analysis. I can attest to this from experience, having looked at many reviews.

The classification report reveals that neutral reviews are indeed hard to classify, with a roughly equal frequency of being mistaken for negative and positive reviews.

The confusion matrix confirms this difficulty, showing that the model mistakes neutral reviews for negative and positive reviews at a roughly equal frequency.

Our model's performance on neutral reviews is a good example of how sentiment analysis can be tricky, even for a well-performing model.

Fine-Tuning and Customization

Credit: youtube.com, Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer

You can customize the model used for sentiment analysis by specifying a different model if desired. This is one of the strengths of the Hugging Face pipeline.

The pipeline loads a standard model by default, but you can load a specific model, such as a distilled version of BERT (distilbert-base-uncased-finetuned-sst-2-english), which is smaller and faster while maintaining high performance.

Fine-tuning a model with your own data can further improve sentiment analysis results and get an extra boost of accuracy in your particular use case. This can be done using the Trainer API from the 🤗Transformers or AutoNLP, a tool to automatically train, evaluate and deploy state-of-the-art NLP models without code or ML experience.

There are more than 215 sentiment analysis models publicly available on the Hugging Face Hub, and integrating them with Python just takes 5 lines of code. You can use a specific sentiment analysis model that is better suited to your language or use case by providing the name of the model.

Recommended read: Ollama Huggingface

Credit: youtube.com, Fine-tuning Large Language Models (LLMs) | w/ Example Code

Some examples of sentiment analysis models include:

The IMDB dataset contains 25,000 movie reviews labeled by sentiment for training a model and 25,000 movie reviews for testing it. You can use this dataset to fine-tune a DistilBERT model for sentiment analysis.

GPU and Scalability

To perform sentiment analysis with Hugging Face, you'll want to consider how to utilize your GPU for efficient processing.

PyTorch requires you to explicitly dispatch a model or variable to the GPU using the `.to('cuda')` method, which can be further specified with a device id like `.to('cuda:0')`. If you have multiple GPUs, you can even wrap your model in `DataParallel` to benefit from data parallelism.

For large datasets, you'll want to use a `Dataloader` to handle multiple files, which can be achieved by specifying the `device_ids` parameter to [0] or leaving it out for automatic selection.

A unique perspective: Fastapi Huggingface Gpu

GPU-Enabled Inference

GPU-enabled inference is a powerful technique that can significantly speed up your models' performance. It's a crucial aspect of scalability, especially when working with large datasets.

Credit: youtube.com, Nvidia CUDA in 100 Seconds

To get started with GPU-enabled inference, you'll need a dataloader that serves batches of tokenized data. This is where the magic happens, and your model can start processing data in parallel.

A model class that performs the inference is also essential. This is where you'll define the logic for your model to make predictions or classify inputs.

To parallelize your model on the GPU devices, you can use PyTorch's DataParallel module. This will allow you to run your training or inference across all the GPU devices on your cluster.

Here's a high-level overview of the steps involved in GPU-enabled inference:

  1. Dataloader for serving batches of tokenized data
  2. Model class that performs the inference
  3. Parallelization of the model on the GPU devices
  4. Iterating through the data for inference and extracting the results

By following these steps, you can unlock the full potential of your GPU-enabled inference pipeline. Remember to explicitly dispatch your model to the GPU using the `to('cuda')` method, and consider using a device id like `cuda:0` if you have multiple GPUs.

Scalable Inference for Large Files

Scalable inference for large files is a must when dealing with lots of data. This is because it's unlikely that all the data is available in a single file.

Credit: youtube.com, Scaling AI Inference Workloads with GPUs and Kubernetes - Renaud Gaubert & Ryan Olson, NVIDIA

In such cases, using a Dataloader with multiple files is a good approach. The code for this can be quite different from what we're used to when dealing with a single file.

The entire code with the changes highlighted for using the Dataloader with multiple files is a good starting point.

Jay Matsuda

Lead Writer

Jay Matsuda is an accomplished writer and blogger who has been sharing his insights and experiences with readers for over a decade. He has a talent for crafting engaging content that resonates with audiences, whether he's writing about travel, food, or personal growth. With a deep passion for exploring new places and meeting new people, Jay brings a unique perspective to everything he writes.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.