Can poetry really break AI safety barriers? A new study suggests that the answer might be yes — and the findings are shaking up the tech world. Researchers have discovered that even a few creatively phrased lines can trick advanced AI models into producing harmful or restricted content. But here's the surprising twist: it all comes down to the power of poetic language.
A team from Italy’s Icaro Lab, part of DexAI, recently explored how artistic expression can disrupt artificial intelligence safeguards. Their research, published in late 2025, tested whether poems containing veiled or explicit dangerous requests could sneak past the filters of major AI systems. They composed twenty short poems in both English and Italian, each ending with instructions that AI programs are specifically trained to reject.
The results were eye-opening. When tested on twenty-five popular AI models built by nine leading companies, more than half produced unsafe or prohibited responses. Some systems stood their ground, while others failed every single test. For instance, OpenAI’s GPT-5 Nano did not produce any harmful content, yet Google’s Gemini 2.5 Pro unexpectedly generated unsafe material across all trials. Meta’s two tested systems fared moderately, responding unsafely to about one-fifth of the poetic prompts.
Why does poetry pose such a threat? According to the researchers, the structure of poetic language scrambles the predictive logic that large language models depend on to detect and filter dangerous content. Metaphors, rhythm, and non-literal phrases can essentially ‘confuse’ the model’s internal systems, leading it to misjudge the meaning and let unsafe ideas slip through. As one researcher explained, when you wrap a harmful command in verse, the AI’s safety net starts to fray.
Even more concerning, the study warns that anyone could potentially exploit this weakness. Poetic jailbreaks—using verse to get around content restrictions—require no technical expertise, raising serious questions about how easily everyday users could manipulate AI systems. It challenges the assumption that safety filters alone can guarantee ethical interaction.
Before publishing their findings, the Icaro Lab team responsibly reached out to all the companies involved, providing them with the dataset and test results. Some, like Anthropic, confirmed receipt and said they were reviewing the study. The publication has already sparked fresh debate about how AI systems can be reinforced as creative, figurative language becomes a more common tool for bypassing restrictions.
The controversy doesn’t end there. Should AI developers strengthen models to better interpret artistic forms like poetry—or would doing so risk over-censoring creative expression? The study highlights a delicate balance between preserving creativity and ensuring safety in the age of generative AI.
How do you see it? Should AI be smart enough to understand poetry—or is that very ability what makes it more vulnerable to human trickery? Share your thoughts in the comments and join the conversation about the hidden interaction between art and algorithm.