Internet search giant Google has introduced its Gemini multimodal artificial intelligence (AI) model, saying it is its most capable to date.
In a statement, the organisation says the new model is able to run on everything from data centres to mobile devices, and can process different forms of information, such as text, audio and images.
It says it has optimised Gemini 1.0 for three different sizes: ultra, for highly-complex tasks; pro, for scaling across a wide range of tasks; and nano, for on-device tasks.
Demis Hassabis, CEO of Google DeepMind, says Gemini is the result of large-scale collaborative teams across Google. He explains Gemini was built from the ground up to be multimodal, so that it can generalise and seamlessly understand, operate across and combine different types of information.
“This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind.
“For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive – an expert helper or assistant,” says Hassabis.
Big tech companies – including Microsoft, xAI, Meta and Amazon – have made large investments in AI this year, as they jostle for the AI lead.
World Wide Worx CEO Arthur Goldstuck says Gemini helps Google catch up with its opponents, but is not ground-breaking in its own right.
“Testing [Google chatbot] Bard with Gemini onboard did not feel much different from what came before, so we must wait for the release of Gemini Ultra next year, and see what impact that has on what is currently expected to be called Bard Advanced.
“However, Google's own hype points to outperforming ChatGPT and its GPT 3.5 large language model, rather than doing anything different from ChatGPT. And by the time Bard Advanced arrives, chances are ChatGPT will have moved from GPT 3.5 to GPT 4, which makes the comparison irrelevant,” adds Goldstuck.
Greg Serandos, co-founder of the African Academy of Artificial Intelligence, says the development of Gemini and its language capabilities − such as translation − could help close that divide and provide accessibility to everyone.
“It claims to be able to translate languages seamlessly, and we believe that will be the case since Google has been a leader in translation for many years now. This is critical from a digital divide and accessibility point of view.
“Presumably, not only will all major African languages be supported, but I would also think that voice-to-text and text-to-voice will open up the world’s information to anybody with a device and internet connectivity,” says Serandos.
As new AI tools continue to be developed, companies like Google will need to comply with regulatory frameworks to ensure their AI tools are safe.
“We’re approaching this work boldly and responsibly,” says Google CEO Sundar Pichai. “That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working collaboratively with governments and experts to address risks as AI becomes more capable.
“And we continue to invest in the very best tools, foundation models and infrastructure and bring them to our products and to others, guided by our AI principles.”
Hassabis notes: “Gemini has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity. We’ve conducted novel research into potential risk areas like cyber offense, persuasion and autonomy, and have applied Google Research’s best-in-class adversarial testing techniques to help identify critical safety issues in advance of Gemini’s deployment.”
Google says it has begun integrating Gemini into its products with a “fine-tuned” version of the AI model being available on Google’s Bard chatbot.
Additionally, Gemini will be integrated into Google’s smartphone Pixel, with the Pixel 8 Pro being the first smartphone engineered to run Gemini Nano.
Starting on 13 December, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI. Gemini Ultra is currently available to select customers.
“We’ve made great progress on Gemini so far and we’re working hard to further extend its capabilities for future versions. We’re excited by the amazing possibilities of a world responsibly empowered by AI − a future of innovation that will enhance creativity, extend knowledge, advance science and transform the way billions of people live and work around the world,” concludes Hassabis.
Share