Google I/O 2024 – A lot happened at I/O 2024!

Google I/O 2024
Google I/O 2024

Google I/O 2024 – Gemini 1.5 Flash

We’ve launched Gemini 1.5 Flash, a leaner and meaner AI model designed for speed and efficiency at scale. This latest addition to our Gemini family is the fastest model available through our API, making it perfect for high-volume tasks that require rapid response times.

Gemma
Google I/O 2024- Gemma

Gemini 1.5 Pro: We’ve enhanced Gemini 1.5 Pro with significant quality improvements across various key applications, including:

  • Translation
  • Coding
  • Reasoning
  • And more

These updates, effective immediately, will enable you to tackle a wider range of complex tasks with even greater accuracy and precision.

Gemini 1.5 Flash: Gemini 1.5 Flash is a compact and agile AI model, specifically designed for tasks that require:

  • Rapid response times
  • High-frequency execution
  • Efficient processing

This streamlined model excels in scenarios where speed and swift decision-making are paramount.

  • Natively multimodal with long context: Both 1.5 Pro and 1.5 Flash come with our 1 million token context window and allow you to interleave text, images, audio, and video as inputs. To get access to 1.5 Pro with a 2 million token context window, join the waitlist in Google AI Studio or Vertex AI for Google Cloud customers.
Gemini 1.5 Pro will have a 2 million token context window in private preview.
Gemini 1.5 Pro will have a 2 million token context window in private preview.
  • New developer features: Based on your feedback, we’re introducing two new API features: video frame extraction and parallel function calling, which lets you return more than one function call at a time. And coming in June, we’ll add context caching to Gemini 1.5 Pro, so you only have to send parts of your prompt, including large files, to the model once. This should make the long context even more useful and more affordable.

Google I/O 2024 – Additions to the Gemma family

Gemma 2 comes with a 27B parameter instance and runs efficiently on GPUs or a single TPU.
Gemma 2 comes with a 27B parameter instance and runs efficiently on GPUs or a single TPU.
  • PaliGemma: Our first vision-language open model is available today and optimized for image captioning, visual Q&A, and other image labeling tasks. PaliGemma joins our other pre-trained Gemma variants, CodeGemma and RecurrentGemma.
  • Gemma 2: We’re excited to announce the upcoming release of Gemma 2 in June, designed to deliver cutting-edge performance at accessible sizes for developers and researchers. Responding to popular demand, our new Gemma 27B model packs a punch while remaining user-friendly, outperforming larger models while efficiently running on GPUs or a single TPU host in Vertex AI.

Project Astra

At Google I/O, Demis Hassabis unveiled an early version of Project Astra, a groundbreaking AI assistant that aims to be your ultimate helper. This cutting-edge technology combines real-time multimodal capabilities, enabling it to:

  • See and understand its surroundings
  • Identify objects and remember their location
  • Answer questions and assist with various tasks

In a stunning demo video, Project Astra effortlessly identified a speaker component, located missing glasses, reviewed code, and more — all in real time and with conversational ease. According to Hassabis, the demo is entirely genuine, with no editing or manipulation.

Imagen 3

In the past year, we’ve achieved remarkable advancements in our image generation technology. Our latest breakthrough is Imagen 3, a cutting-edge text-to-image model that produces:

  • Photorealistic images with unprecedented detail
  • Lifelike results with minimal visual imperfections
  • A significant reduction in distracting artifacts compared to our previous models

Imagen 3 represents a substantial improvement in image generation quality and fidelity, marking a major milestone in our AI research and development journey.

Imagen 3
Prompt: A close-up of a sleek wolf perched regally in front of a gray background, in a high-resolution photograph with detailed fine details, isolated on a plain stock photo with color grading in the style of a hyper-realistic style.

Imagen 3 excels in rendering text, a longstanding challenge for image generation models. With this capability, the possibilities are endless! Imagine generating:

  • Personalized birthday messages with custom fonts and designs
  • Professional title slides for presentations

Imagen 3 will be coming soon to Vertex AI.

Veo

Veo is a groundbreaking AI model that produces stunning 1080p resolution videos in various cinematic styles, exceeding a minute in length. With its advanced natural language understanding and visual semantics, Veo brings your creative vision to life, accurately capturing the tone and details of your prompt.

  • Enjoy unparalleled creative control with Veo, which comprehends cinematic terminology like “timelapse” and “aerial landscape shots.” 
  • This innovative model generates consistent and coherent footage, ensuring realistic movements of people, animals, and objects throughout the video.

Other highlights

  • Gemini Advanced now has the largest context window of any commercially available chatbot in the world.
  • We added the ability to upload files via Google Drive or directly from your device right into Gemini Advanced.
  • Gemini Advanced has a new planning feature that goes beyond a list of suggested activities and will actually create a custom itinerary just for you

Valuable comments