Google LLC is enhancing its Gemini AI assistant with new image generation and customization features. Originally introduced as Bard last year, Gemini is a chatbot powered by a series of large language models. It can generate text, craft code, solve math problems, and more. The assistant is available in both a free version and a subscription-based tier for consumers, along with two paid versions tailored for businesses.
The update includes the introduction of a new image generation model called Imagen 3. Unveiled at Google’s I/O event in May, this model will soon be available to both consumer and business users of Gemini. Imagen 3 improves upon its predecessor by generating more photorealistic images and handling long, complex user instructions with greater accuracy. If the generated image doesn’t fully meet the user’s expectations, they can refine it by providing follow-up prompts.
Imagen 3 operates as a latent diffusion model, which means it doesn’t process images directly but rather converts them into a mathematical structure known as latent space. This method preserves only the most critical data points, effectively compressing the files. As a result, the AI can analyze images using less hardware, reducing costs and improving efficiency.
In addition to introducing Imagen 3, Google is reactivating Gemini’s feature for generating images of people, which had been disabled in February due to concerns over historical inaccuracies. According to Dave Citron, Senior Director of Product Management for Gemini Experiences, the feature has undergone rigorous testing and now includes guardrails to prevent the creation of harmful content. Google has taken steps to ensure the tool won’t generate photorealistic images of identifiable individuals, depictions of minors, or excessively violent, gory, or sexual scenes.
Alongside these updates, Google is launching a new Gemini feature called Gems. This allows users to create customized versions of the chatbot tailored to specific tasks. By providing instructions on how the assistant should respond to prompts, users can save these settings as a “Gem.” For instance, a user can configure Gemini to always generate text in a particular style, eliminating the need to repeat the request with each prompt. Google is also releasing a set of premade Gems designed for tasks like troubleshooting code and offering writing tips, along with a general-purpose Gem that simplifies complex topics.
These enhancements reflect Google’s commitment to advancing its AI capabilities while ensuring user safety and content accuracy.
Topics #Artificial intelligence #Chatbot #chatbots #Gem chatbots #Gemini #Gemini AI #Google #image generation #news