Google unleashes AI offensive at I/O developer conference

Johannesburg, 15 May 2024

Google and Alphabet CEO Sundar Pichai.

Internet search giant Google made a slew of product announcements at its just-ended Google I/O 2024 developer conference in California.

This, as the company makes further investments in artificial intelligence (AI) technologies, with competition in the market becoming fierce.

This week, Google’s rival in Microsoft-backed OpenAI unveiled its latest AI model, GPT-4o, which allows users to interact through audio, images and text in real-time. The platform can also respond to audio inputs similar to human response time in a conversation.

At the conference, Google announced the general availability of Gemini 1.5 Pro, an AI model with a one million token context window, enabling it to process vast amounts of information, such as an hour of video or 1 500 pages of a PDF, and respond to complex queries about this source material, says the firm.

It notes Gemini 1.5 pro will also be available in more than 35 languages starting today – providing access to the latest technical advances, including deep analysis of data files, such as spreadsheets, enhanced image understanding and an expanded context window, starting at one million tokens.

Building next-gen AI apps

Additionally, Google introduced Gemini 1.5 Flash, which it says is a more cost-efficient model built based on user feedback, with lower latencies, and Project Astra, Google’s vision for the next generation of AI assistants − a responsive agent that can understand and react to the context of conversations.

“Today, more than 1.5 million developers use Gemini models across our tools. You’re using it to debug code, get new insights and build the next generation of AI applications,” says Sundar Pichai, Google and Alphabet CEO.

The company is integrating Gemini into Search, enhancing its ability to understand and respond to complex queries. This includes features like AI Overview, which is designed for advanced multi-step reasoning, planning and multimodal capabilities.

According to Google, this enhancement ensures people can ask intricate, multi-step questions, tailor their search outcomes, and interact using videos for an enriched query experience. This is set to launch soon, starting in the US before expanding globally.

Another feature is multi-step reasoning, which breaks down complex questions into smaller parts, synthesising the most relevant information and stitching it all together into a comprehensive AI answer.

Search with video allows users to ask questions about video content by taking a quick video and get AI-powered answers in response.

Google also announced Gemini is being integrated into Android to power new features, such as “Circle to Search”, which allows users to search for anything they see on their screen. According to the company, this feature is expanding to more surfaces, such as Chrome desktop and tablets.

It adds that Gemini Nano will enhance TalkBack, Android’s screen reader, with new features that make it easier for people with visual impairments to navigate their devices and access information. This feature will come first to Pixel devices later in the year.

Gemini Nano will be used to detect scam phone calls in real-time, providing users with warnings and helping them avoid falling victim to fraud, says the firm.

Google also rolled out Gemini for Workspace, which it says helps businesses and everyday users to get more out of their Google apps – from drafting e-mails in Gmail, to organising project plans in Sheets.

“Over the last year, more than a million people and tens of thousands of companies have used generative AI in Workspace when they need an extra hand or dose of inspiration,” says the company.

It also revealed that Google Photos is getting a new feature, called “Ask Photos”, which uses Gemini to answer questions about photos and videos, such as finding specific images or recalling past events.

Imagen 3, Google’s latest text-to-image model, is now available to select creators in private preview.

Generative video model

The internet search giant also introduced Veo, its video generation model, capable of creating high-quality 1080p videos up to a minute or more long.

Google explains Veo closely follows user prompts and offers creative control, accurately following directions like quick zooming or slow-motion crane shots.

“Veo builds upon years of generative video model work and combines architecture, scaling laws and novel techniques to improve latency and output resolution,” it states.

Google says it is collaborating with musicians, songwriters and producers, in partnership with YouTube, to better understand the role of AI in music creation.

The partners are developing a suite of music AI tools that can create instrumental sections and transfer styles between tracks.

According to the firm, these collaborations inform the development of generative music technologies like Lyria, Google's most advanced family of models for AI music generation. New experimental music created with these tools by Grammy winner Wyclef Jean, electronic musician Marc Rebillet, songwriter Justin Tranter and others was released on their respective YouTube channels at I/O.