Google’s New AI Model Gemma 3 Shines for Creative Writers, Falls Short Elsewhere

On Tuesday, Google released Gemma 3, an open-source AI model based on Gemini 2.0 that packs surprising muscle for its size.

The full model runs on a single GPU, yet Google benchmarks depict it as though it’s competitive enough when pitted against larger models that require significantly more computing power.

The new model family, which Google says was “codesigned with the family of Gemini frontier models,” comes in four sizes ranging from 1 billion to 27 billion parameters.

Google is positioning it as a practical solution for developers who need to deploy AI directly on devices such as phones, laptops, and workstations.

“These are our most advanced, portable and responsibly developed open models yet,” Clement Farabet, VP of Research at Google DeepMind, and Tris Warkentin, Director at Google DeepMind, wrote in an announcement on Wednesday.

Despite its relatively modest size, Gemma 3 beat out larger models including Meta’s Llama-405B, DeepSeek-V3, Alibaba’s Qwen 2.5 Max and OpenAI’s o3-mini on LMArena’s leaderboard.

The 27B instruction-tuned version scored 1339 on the LMSys Chatbot Arena Elo rating, placing it among the top 10 models overall.

Gemma 3 is also multimodal—it handles text, images, and even short videos in its larger variants.

Its expanded context window of 128,000 tokens (32,000 for the 1B version) dwarfs the previous Gemma 2’s 8,000-token limit, allowing it to process and understand much more information at once.

The model’s global reach extends to over 140 languages, with 35 languages supported out of the box. This positions it as a viable option for developers building applications for international audiences without needing separate models for different regions.

Google claims the Gemma family has already seen over 100 million downloads since its launch last year, with developers creating more than 60,000 variants.

The community-created “Gemmaverse“—an entire ecosystem built around the Gemma family of models—includes custom versions for Southeast Asia, Bulgaria, and a custom text to audio model named OmniAudio.

Developers can deploy Gemma 3 applications through Vertex AI, Cloud Run, the Google GenAI API, or in local environments, providing flexibility for various infrastructure requirements.

Testing Gemma

We put Gemma 3 through a series of real-world tests to evaluate its performance across different tasks. Here’s what we found in each area.

Creative Writing

We were surprised by Gemma 3’s creative writing capabilities. Despite having just 27 billion parameters, it managed to outperform Claude 3.7 Sonnet, which recently beat Grok-3 in our creative writing tests. And it won by a long shot.

Gemma 3 produced the longest story of all models we tested, with the exception of Longwriter, which was specifically designed for extended narratives.

The quality wasn’t sacrificed for quantity, either—the writing was engaging and original, avoiding the formulaic openings that most AI models tend to show.

Gemma also was very good at creating detailed, immersive worlds with strong narrative coherence. Character names, locations, and descriptions all fit naturally within the story context.

And this is a major plus for creative writers because other models sometimes mix up cultural references or skip these small details, which end up killing the immersion. Gemma 3 maintained consistency throughout.

The story’s longer format allowed for natural story development with seamless transitions between narrative segments. The model was very good at describing actions, feelings, thoughts, and dialogue in a way that created a believable reading experience.

When asked to incorporate a twist ending, it managed to do so without breaking the story’s internal logic. All the other models until now tended to mess it up a bit when trying to wrap things up and end the story. Not Gemma.

For creative writers looking for an AI assistant that can help with safe-for-work fiction projects, Gemma 3 appears to be the current frontrunner.

You can read our prompt and all the replies in our GitHub repository.

Summarization and Information Retrieval

While its creative writing was top notch, Gemma 3 struggled significantly with document analysis tasks.

We uploaded a 47-page IMF document to Google’s AI Studio, and while the system accepted the file, the model failed to complete its analysis, stalling midway through the task. Multiple attempts yielded identical results.

We tried an alternative approach that worked with Grok-3, copying and pasting the document content directly into the interface, but encountered the same problem.

The model simply couldn’t handle processing and summarizing long-form content.

It’s worth noting that this limitation might be related to Google’s AI Studio implementation rather than an inherent flaw in the Gemma 3 model itself.

Running the model locally might yield better results for document analysis, but users relying on Google’s official interface will likely face these limitations, at least for now.

Sensitive Topics

In a unique feature among AI chatbot interfaces, Google AI Studio offers very strict content filters which are accessible via a series of sliders.

We tested Gemma’s boundaries by requesting questionable advice for hypothetical unethical situations (advice to seduce a married woman), and the model firmly refused to comply. Similarly, when asked to generate adult content for a fictional novel, it declined to produce anything remotely suggestive.

Our attempts to adjust or bypass these censorship filters by turning off Google’s parameters didn’t really work.

Google AI Studio “safety settings” in theory control how restricted the model is when it comes to generating content that may be deemed as harassment, hate speech, sexually explicit or dangerous.

Even with all restrictions turned off, the model consistently rejected engaging in conversations containing controversial, violent, or offensive elements—even when these were clearly for fictional creative purposes.

In the end, the controls didn’t really make any difference.

Users hoping to work with sensitive topics, even in legitimate creative contexts, will likely need to either find ways to jailbreak the model or craft extremely careful prompts.

Overall, Gemma 3’s content restrictions for those willing to use Google’s Studio appear to be on par with those of ChatGPT, sometimes even being too restrictive depending on the use case.

Those willing to go local, won’t face those issues. For those in need of a nice AI interface and a somewhat uncensored model, the best option seems to be Grok-3 which has way less restrictions. All the other closed models also refused.

You can read our prompt and all the replies in our GitHub repository.

Multimodality.

Gemma 3 is multimodal at its core, which means it is able to process and understand images natively without relying on a separate vision model.

In our testing, we encountered some platform limitations. For instance, Google’s AI Studio didn’t allow us to process images directly with the model.

However, we were able to test the image capabilities through Hugging Face’s interface—which features a smaller version of Gemma 3.

The model demonstrated a solid understanding of images, successfully identifying key elements and providing relevant analysis in most cases. It could recognize objects, scenes, and general content within photos with reasonable accuracy.

However, the smaller model variant from Hugging Face showed limitations with detailed visual analysis.

In one of our tests, it failed to correctly interpret a financial chart, hallucinating that Bitcoin was priced around $68,618 in 2024—information that wasn’t actually displayed in the image but likely came from its training data.

While Gemma 3’s multimodal capabilities are functional, using a smaller model may not match the precision of larger specialized vision models—even open source ones like Llama 3.2 Vision, LlaVa or Phi Vision—particularly when dealing with charts, graphs, or content requiring fine-grained visual analysis.

Non-Mathematical Reasoning

As expected for a traditional language model without specialized reasoning capabilities, Gemma 3 shows clear limitations when faced with problems requiring complex logical deduction rather than simple token predictions.

We tested it with our usual mystery problem from the BigBENCH dataset, and the model failed to identify key clues or draw logical conclusions from the provided information.

Interestingly enough, when we attempted to guide the model through explicit chain-of-thought reasoning (essentially asking it to “think step by step”), it triggered its violence filters and refused to provide any response.

You can read our prompt and all the replies in our GitHub repository.

Is This the Model for You?

You’ll love or hate Gemma 3 depending on your specific needs and use cases.

For creative writers, Gemma 3 is a standout choice. Its ability to craft detailed, coherent, and engaging narratives outperforms some larger commercial models including Claude 3.7, Grok-3 and GPT-4.5 with minimum conditioning.

If you write fiction, blog posts, or other creative content that stays within safe-for-work boundaries, this model offers exceptional quality at zero cost, running on accessible hardware.

Developers and creators working on multilingual applications will appreciate Gemma 3’s support for 140+ languages. This makes it practical to create region-specific services or global applications without maintaining multiple language-specific models.

Small businesses and startups with limited computing resources can also enjoy Gemma 3’s efficiency. Running advanced AI capabilities on a single GPU dramatically lowers the barrier to entry for implementing AI solutions without massive infrastructure investments.

The open-source nature of Gemma 3 provides flexibility that closed models like Claude or ChatGPT simply can’t match.

Developers can fine-tune it for specific domains, modify its behavior, or integrate it deeply into existing systems without API limitations or subscription costs.

For applications with strict privacy requirements, the model can run completely disconnected from the internet on local hardware.

However, users who need to analyze lengthy documents or work with sensitive topics will encounter frustrating limitations. Research tasks requiring nuanced reasoning or the ability to process controversial material remain better suited to larger closed-source models that offer more flexibility.

It’s also not really good at reasoning tasks, coding, or any of the complex tasks that our society now expects AI models to excel at. So don’t expect it to generate a game for you, improve your code or excel at anything beyond creative text writing.

Overall, Gemma 3 won’t replace the most advanced proprietary or open source reasoning models for every task.

Yet its combination of performance, efficiency, and customizability positions it as a very interesting choice for AI enthusiasts who love trying new things, and even open source fans who want to control and run their models locally.

Edited by Sebastian Sinclair

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

 

Decrypt – Read More   

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *