Search results for

All search results
Best daily deals

Affiliate links on Android Authority may earn us a commission. Learn more.

Gemini hackers are using its own tools against it

Researchers found a way to hack Gemini using its own tools, boosting attacks with a method called Fun-Tuning.
By

Published onMarch 28, 2025

Google Gemini logo on smartphone stock photo (4)
Edgar Cervantes / Android Authority
TL;DR
  • Researchers used the Gemini fine-tuning tool to help hack the Google AI chatbot.
  • The new method, called Fun-Tuning, adds nonsense text that helps trick the AI into following hidden instructions.
  • Google says it’s always working on defenses, but the researchers believe that fixing the issue may impact useful features for developers.

They say it takes a thief to catch a thief, and perhaps the same is true when it comes to hacking LLMs. Academic researchers have discovered a way to make Google’s Gemini AI models more vulnerable to hacking — and they did it using Gemini’s own tools.

The technique was developed by a team from UC San Diego and the University of Wisconsin, as reported in Ars Technica. Dubbed “Fun-Tuning,” it significantly increases the success rate of prompt injection attacks, where hidden instructions are embedded in text that an AI model reads. These attacks can cause the model to leak information, give incorrect answers, or take other unintended actions.

What makes the method interesting is that it uses Gemini’s own fine-tuning feature, which is usually intended to help businesses train the AI on custom datasets. Instead, the researchers used it to test and refine prompt injections automatically. It’s kind of like teaching Gemini how to fool itself.

It’s kind of like teaching Gemini how to fool itself.

Fun-Tuning works by generating strange-looking prefixes and suffixes that are added to an otherwise ineffective prompt injection. These additions “boost” the prompt and make it much more likely to succeed. In one case, a prompt that failed on its own was made effective by wrapping it in affixes like “wandel ! ! ! !” and “formatted ! ASAP !

In testing, the hack achieved a 65% success rate on Gemini 1.5 Flash and an 82% success rate on the older Gemini 1.0 Pro model — more than double the baseline success rates without Fun-Tuning. The attacks also transferred well between models, meaning an injection that worked on one version often worked on others too.

The vulnerability stems from the way fine-tuning works. During training, Gemini provides feedback in the form of a “loss” score, which is a number that reflects how far the model’s answer is from the desired result. Attackers can exploit that feedback to optimize their prompts until the system finds a successful one.

Samsung Galaxy Z Flip 6 gemini pop up
Ryan Haines / Android Authority

Google didn’t respond directly to the Fun-Tuning technique. In a general statement, a spokesperson said that “defending against this class of attack has been an ongoing priority for us” and pointed to existing safeguards against prompt injection and harmful responses. The company added that Gemini is regularly tested against these kinds of attacks through internal “red-teaming” exercises.

The researchers feel the issue may be tricky to fix since the feedback that enables Fun-Tuning is a core part of how fine-tuning works. In other words, making it less effective for Fun-Tuning risks reducing its utility overall.

Got a tip? Talk to us! Email our staff at news@androidauthority.com. You can stay anonymous or get credit for the info, it's your choice.
You might like
    Follow