Llama Firewall Bypassed Exposes AI Security Flaws
The Llama Firewall bypassed by prompt injection attacks reveals serious flaws in Meta’s AI protection systems. Trendyol’s security team discovered these vulnerabilities during internal testing.
They found that Meta’s open-source Llama Firewall failed to stop multilingual prompt injections. A Turkish phrase that tells the model to ignore earlier prompts passed through with no alert. Even leetspeak variations like “1gn0r3” triggered no warning.
This shows that the firewall heavily relies on English keywords and exact matches. It also struggles with language variations, allowing simple manipulations to go undetected.
Hidden Attacks and Insecure Code Generation
The CODE_SHIELD module also showed weaknesses. When asked to write a Flask API with user input, it created code vulnerable to SQL injection.
Despite this, the system flagged the output as safe. This misjudgment could lead teams to trust faulty AI-generated code in production without human review.
Trendyol’s team also demonstrated that Llama Firewall bypassed Unicode-embedded commands. They used invisible characters like zero-width spaces to hide instructions inside harmless-looking prompts.
These invisible threats are hard to detect with basic security tools. Even experienced developers might unknowingly share malicious payloads in copied prompts.
Half the Attacks Passed Undetected
Trendyol tested 100 unique prompt injection payloads. About half of them passed through the firewall undetected. This shows the system lacks robust safeguards for modern threat tactics.
The team warned that attackers could exploit these flaws to bypass safety filters, inject biased content, or create insecure code.
Although Meta and Google acknowledged the reports, both companies closed them without issuing rewards or patches. Trendyol has since improved its internal AI risk models and shared the findings with the wider security community.
The company urges others to conduct red team testing before launching LLM tools. As LLM adoption grows, the fact that Llama Firewall was bypassed is a serious wake-up call.
Security professionals must now develop smarter, layered defense systems to protect against evolving threats in AI environments.