
Late last year, Google launched its first natively multimodal model, Gemini 1.0, in three sizes: Ultra, Pro, and Nano. Just a few months later, the release of 1.5 Pro followed, featuring enhanced performance. This inspired further innovation, leading to the introduction of Gemini 1.5 Flash, a lighter-weight model than 1.5 Pro, designed to be fast and efficient to serve at scale.
The company on Tuesday hosted its annual I/O developer conference, and rolled out its latest artificial intelligence products, from search and chat features to AI assistants. Here's a look at the developments.
Gemini 1.5 Flash
Beginning with the 1.5 Flash, it is the newest addition to the Gemini model family and the fastest Gemini model served in the API. It’s optimized for high-volume, high-frequency tasks at scale, is more cost-efficient to serve, and features a breakthrough long context window. The 1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more.
Over the last few months, significant improvements have been made to 1.5 Pro, the best model for general performance across a wide range of tasks. The 1.5 Pro can now follow increasingly complex and nuanced instructions, including ones that specify product-level behaviour involving role, format, and style.
Moving further, the Gemini Nano is expanding beyond text-only inputs to include images as well. Starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand the world the way people do — not just through text, but also through sight, sound, and spoken language. Read more about Gemini 1.0 Nano on Android.
Project Astra
As part of Google DeepMind’s mission to build AI responsibly to benefit humanity, progress is being shared in building the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).
To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand the context and take action. It also needs to be proactive, teachable, and personal, allowing users to talk to it naturally and without lag or delay.
Gemini for Workspace
Moving further, since people are always searching their emails in Gmail, Google is working to make it much more powerful with Gemini. So for example, as a parent, you want to stay informed about everything that’s going on with your child’s school. Gemini can help you keep up.
While incredible progress has been made in developing AI systems that can understand multimodal information, achieving conversational response time remains a difficult engineering challenge. Over the past few years, efforts have been made to improve how models perceive, reason, and converse to make the pace and quality of interaction feel more natural.