The recent Hugging Face vulnerability has left many in the tech industry on edge. A critical vulnerability was discovered in the popular Transformers library, which is used by millions of developers worldwide.
The vulnerability, known as CVE-2021-41190, was discovered in the library's `tokenizers` module. This module is used to handle tokenization, a crucial step in natural language processing.
Developers who use the Transformers library are advised to update to version 4.12.0 or later to patch the vulnerability.
A fresh viewpoint: Hugging Face Transformers Format
Vulnerabilities and Risks
Hugging Face's vulnerability has been a major concern for the AI community, and it's essential to understand the risks involved.
Researchers at Lasso Security found that they could access Meta's Bloom, Meta-Llama, and Pythia large language model repositories using unsecured API access tokens discovered on GitHub and the Hugging Face platform. This would have allowed an adversary to silently poison training data in these widely used LLMs, steal models and data sets, and potentially execute other malicious activities.
Explore further: Ai Art Data Poisoning
The growing existence of publicly available and potentially malicious AI/ML models poses a significant risk to the supply chain, particularly for attacks that target demographics such as AI/ML engineers and pipeline machines.
In February, cybersecurity firm JFrog found approximately 100 instances of malicious AI ML models used to execute malicious code on a victim's machine. One of the models opened a reverse shell that allowed a remote threat actor to access a device running the code.
Hugging Face's Spaces platform was breached, allowing hackers to access authentication secrets for its members. This incident highlights the importance of tightening security measures, especially as the platform grows in popularity.
Researchers at Wiz discovered a vulnerability that allowed them to upload custom models and leverage container escapes to gain cross-tenant access to other customers' models. This vulnerability demonstrates the need for robust security measures to prevent such attacks.
To mitigate these risks, AI developers should use new tools available to them, such as Huntr, a bug-bounty platform tailored specifically for AI vulnerabilities. This collective effort is crucial in fortifying Hugging Face repositories and safeguarding the privacy and integrity of AI/ML engineers and organizations relying on these resources.
Here are some key vulnerabilities and risks associated with Hugging Face:
- Unsecured API access tokens
- Publicly available and potentially malicious AI/ML models
- Malicious code execution through pickle-based models
- Tensorflow Keras models susceptible to executing arbitrary code
- Vulnerabilities in Hugging Face's Spaces platform
- Container escapes allowing cross-tenant access
Securing Your APIs
APIs are the backbone of many applications, and as the use of AI models grows, they're becoming increasingly vulnerable to attacks. Researchers at Lasso Security previously discovered unsecured API access tokens on GitHub and the Hugging Face platform, allowing an adversary to silently poison training data in widely used LLMs.
API security is a growing concern, with API attacks increasing in frequency. Organizations integrating with generative AI technologies may face the same risks and consequences, highlighting the need for secure API implementations and protecting third-party transactions with good security hygiene.
To secure your APIs, start by treating all API inputs as dangerous. Don't assume end users won't try to manipulate the API on their own. Enforce a strong CORS policy with custom, unguessable authorization headers to mitigate the risk of cross-site request forgery and other cross-origin attacks.
Here are six measures to secure your API implementations, as recommended by Tushar Kulkarni:
- Don't use GUIDs/UUIDs that can be guessed by a threat actor in an intuitive way.
- Never rely on a client to filter sensitive data.
- Enforce a limit on how often a client can call the API endpoint.
- Lock down endpoints and validate a user's role and privileges before performing an action.
- Avoid binding client-side data into code variables and later into objects in databases.
- Enforce a strong CORS policy with custom, unguessable authorization headers.
By following these best practices, you can significantly reduce the risk of API attacks and ensure the security and integrity of your AI models and applications.
Understanding the Issue
Hugging Face models can be vulnerable to malicious code execution due to their use of the pickle format, which can contain arbitrary code that is executed when the file is loaded.
The pickle format is a common format for serializing Python objects, but it can also pose a security risk if not handled properly.
Attackers can inject malicious code into PyTorch models using the __reduce__ method of the pickle module, potentially leading to malicious behavior when the model is loaded.
Hugging Face models, including those trained with the Transformers library, use the torch.load() function to deserialize the model from a file, which can execute arbitrary code.
While Hugging Face has security protections in place, such as malware scanning and pickle scanning, it doesn't outright block pickle models from being downloaded, marking them as "unsafe" instead.
This means that someone can still download and execute potentially harmful models, even if they are marked as "unsafe".
Consider reading: Towards Deep Learning Models Resistant to Adversarial Attacks
AI Providers Need to Foster Trust
As large language models grow in use, they will become embedded into applications using APIs. Organizations are already using generative AI from a variety of vendors and various channels.
API attacks are increasing, and organizations integrating with generative AI technologies may face the same risks and consequences. The AI industry will need to work to maintain trust by building secure API implementations and protecting third-party transactions with good security hygiene.
Karl Mattson, CISO of API security firm Noname Security, emphasizes the importance of building trust in APIs and beyond. He notes that organizations are already using generative AI in various forms, including integrating it into in-house application development.
To mitigate the risks associated with AI/ML models, AI developers should use new tools available to them, such as Huntr, a bug-bounty platform tailored specifically for AI vulnerabilities.
Recommended read: What Challenges Does Generative Ai Face
Sources
- https://techhq.com/2024/03/hugging-face-safetensors-vulnerable-to-supply-chain-attacks/
- https://www.bleepingcomputer.com/news/security/ai-platform-hugging-face-says-hackers-stole-auth-tokens-from-spaces/
- https://www.csoonline.com/article/2137564/after-snowflake-hugging-face-reports-security-breach.html
- https://www.darkreading.com/application-security/hugging-face-ai-platform-100-malicious-code-execution-models
- https://www.reversinglabs.com/blog/5-lessons-learned-from-the-huggingface-api-breach
Featured Images: pexels.com