Hugging Face Vulnerability Raises Concerns for API Security

Author

Reads 403

A couple wearing face masks share a tender embrace, capturing love in challenging times.
Credit: pexels.com, A couple wearing face masks share a tender embrace, capturing love in challenging times.

The recent Hugging Face vulnerability has left many in the tech industry on edge. A critical vulnerability was discovered in the popular Transformers library, which is used by millions of developers worldwide.

The vulnerability, known as CVE-2021-41190, was discovered in the library's `tokenizers` module. This module is used to handle tokenization, a crucial step in natural language processing.

Developers who use the Transformers library are advised to update to version 4.12.0 or later to patch the vulnerability.

Vulnerabilities and Risks

Hugging Face's vulnerability has been a major concern for the AI community, and it's essential to understand the risks involved.

Researchers at Lasso Security found that they could access Meta's Bloom, Meta-Llama, and Pythia large language model repositories using unsecured API access tokens discovered on GitHub and the Hugging Face platform. This would have allowed an adversary to silently poison training data in these widely used LLMs, steal models and data sets, and potentially execute other malicious activities.

A fresh viewpoint: Encrypt Data

Credit: youtube.com, How Hugging Face Was (Ethically) Hacked

The growing existence of publicly available and potentially malicious AI/ML models poses a significant risk to the supply chain, particularly for attacks that target demographics such as AI/ML engineers and pipeline machines.

In February, cybersecurity firm JFrog found approximately 100 instances of malicious AI ML models used to execute malicious code on a victim's machine. One of the models opened a reverse shell that allowed a remote threat actor to access a device running the code.

Hugging Face's Spaces platform was breached, allowing hackers to access authentication secrets for its members. This incident highlights the importance of tightening security measures, especially as the platform grows in popularity.

Researchers at Wiz discovered a vulnerability that allowed them to upload custom models and leverage container escapes to gain cross-tenant access to other customers' models. This vulnerability demonstrates the need for robust security measures to prevent such attacks.

To mitigate these risks, AI developers should use new tools available to them, such as Huntr, a bug-bounty platform tailored specifically for AI vulnerabilities. This collective effort is crucial in fortifying Hugging Face repositories and safeguarding the privacy and integrity of AI/ML engineers and organizations relying on these resources.

Here are some key vulnerabilities and risks associated with Hugging Face:

  • Unsecured API access tokens
  • Publicly available and potentially malicious AI/ML models
  • Malicious code execution through pickle-based models
  • Tensorflow Keras models susceptible to executing arbitrary code
  • Vulnerabilities in Hugging Face's Spaces platform
  • Container escapes allowing cross-tenant access

Securing Your APIs

Credit: youtube.com, Understanding The Fundamentals of API Security | How APIs are Attacked and How to Secure Them

APIs are the backbone of many applications, and as the use of AI models grows, they're becoming increasingly vulnerable to attacks. Researchers at Lasso Security previously discovered unsecured API access tokens on GitHub and the Hugging Face platform, allowing an adversary to silently poison training data in widely used LLMs.

API security is a growing concern, with API attacks increasing in frequency. Organizations integrating with generative AI technologies may face the same risks and consequences, highlighting the need for secure API implementations and protecting third-party transactions with good security hygiene.

To secure your APIs, start by treating all API inputs as dangerous. Don't assume end users won't try to manipulate the API on their own. Enforce a strong CORS policy with custom, unguessable authorization headers to mitigate the risk of cross-site request forgery and other cross-origin attacks.

Here are six measures to secure your API implementations, as recommended by Tushar Kulkarni:

  • Don't use GUIDs/UUIDs that can be guessed by a threat actor in an intuitive way.
  • Never rely on a client to filter sensitive data.
  • Enforce a limit on how often a client can call the API endpoint.
  • Lock down endpoints and validate a user's role and privileges before performing an action.
  • Avoid binding client-side data into code variables and later into objects in databases.
  • Enforce a strong CORS policy with custom, unguessable authorization headers.

By following these best practices, you can significantly reduce the risk of API attacks and ensure the security and integrity of your AI models and applications.

Understanding the Issue

Credit: youtube.com, Detect vulnerabilities in your Hugging Face text classification models | Tutorial

Hugging Face models can be vulnerable to malicious code execution due to their use of the pickle format, which can contain arbitrary code that is executed when the file is loaded.

The pickle format is a common format for serializing Python objects, but it can also pose a security risk if not handled properly.

Attackers can inject malicious code into PyTorch models using the __reduce__ method of the pickle module, potentially leading to malicious behavior when the model is loaded.

Hugging Face models, including those trained with the Transformers library, use the torch.load() function to deserialize the model from a file, which can execute arbitrary code.

While Hugging Face has security protections in place, such as malware scanning and pickle scanning, it doesn't outright block pickle models from being downloaded, marking them as "unsafe" instead.

This means that someone can still download and execute potentially harmful models, even if they are marked as "unsafe".

AI Providers Need to Foster Trust

Credit: youtube.com, Is Your Local LLM Safe? 😵 Unmasking Malware Hiding in Hugging Face Models!

As large language models grow in use, they will become embedded into applications using APIs. Organizations are already using generative AI from a variety of vendors and various channels.

API attacks are increasing, and organizations integrating with generative AI technologies may face the same risks and consequences. The AI industry will need to work to maintain trust by building secure API implementations and protecting third-party transactions with good security hygiene.

Karl Mattson, CISO of API security firm Noname Security, emphasizes the importance of building trust in APIs and beyond. He notes that organizations are already using generative AI in various forms, including integrating it into in-house application development.

To mitigate the risks associated with AI/ML models, AI developers should use new tools available to them, such as Huntr, a bug-bounty platform tailored specifically for AI vulnerabilities.

Keith Marchal

Senior Writer

Keith Marchal is a passionate writer who has been sharing his thoughts and experiences on his personal blog for more than a decade. He is known for his engaging storytelling style and insightful commentary on a wide range of topics, including travel, food, technology, and culture. With a keen eye for detail and a deep appreciation for the power of words, Keith's writing has captivated readers all around the world.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.