Google’s Gemini AI: A Serious Threat to ChatGPT

A set of large language models (LLMs) incorporating AlphaGo-inspired techniques, marks Google’s strategic response to ChatGPT. With multimodal capabilities and potential access to Google’s extensive proprietary training data from various services, Google Gemini aims to challenge ChatGPT’s dominance in the generative AI space.

This move underscores Google’s commitment to AI innovation and competition in the rapidly growing generative AI market, projected to be worth $1.3 trillion by 2032.

The launch of ChatGPT last November shook Google to its foundations. The popular chatbot posed such a threat to the company’s business that it had to declare a code red and begin investing in catching up on the generative AI bandwagon.

This effort has not only resulted in the release of Google Bard but also Google Gemini.

Everything We Know So Far About Gemini

While many expect that Google Gemini will be released in the fall of 2023, not much is known about the model’s capabilities.

Back in May, Sundar Pichai, CEO of Google and Alphabet, released a blog post with a high-level look at the LLM, explaining:

“Gemini was created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning.”

Pichai also noted that “while still early, we’re already seeing impressive multimodal capabilities not seen in prior models.

“Once fine-tuned and rigorously tested for safety, Gemini will be available at various sizes and capabilities, just like PaLM 2.”

Since then, not much has been said about the release officially, besides Google DeepMind CEO Demis Hassabis’s interview with Wired noting that Gemini will be “combining some of the strengths of AlphaGo type systems with the amazing language capabilities of the large models.”

Android Police has also claimed that an anonymous source involved with the product has commented that Gemini will be able to generate text and contextual images and will be trained on sources such as YouTube video transcripts.

Google Gemini Will Be Multimodal

Pichai stated that Gemini combines the strengths of DeepMind’s AlphaGo system, known for mastering the complex game Go, with extensive language modeling capabilities.

He said it is designed from the ground up to be multimodal, integrating text, images, and other data types. This could allow for more natural conversational abilities.

Pichai also hinted at future capabilities like memory and planning that could enable tasks requiring reasoning.

Early Gemini Results Are Promising

In a September Time interview, Hassabis reiterated that Gemini aims to combine scale and innovation.

He said incorporating planning and memory is in the early exploratory stages.

Hassabis also stated that Gemini may employ retrieval methods to output entire blocks of information rather than word-by-word generation to improve factual consistency.

He revealed that Gemini builds on DeepMind’s multimodal work, like the image captioning system Flamingo.

Overall, Hassabis said Gemini is showing “very promising early results.”

Will Gemini take the crown from ChatGPT?

One of the biggest conversations around the release of Gemini is whether the mystery language model has what it takes to unseat ChatGPT, which this year reached over 100 million monthly active users.

Initially, Google was using Gemini’s ability to generate text and images to differentiate it from GPT4, but on September 25, 2023. OpenAI announced that users would be able to enter voice and image queries into ChatGPT.

Now that OpenAI is experimenting with a multimodal model approach and has connected ChatGPT to the Internet, perhaps the most threatening differentiator between the two is Google’s vast array of proprietary training data. Google Gemini can process data taken across services, including Google Search, YouTube, Google Books, and Google Scholar.

The use of this proprietary data in training the Gemini models could result in a distinct edge in the sophistication of the insights and inferences that it can take from a data set. This is particularly true if early reports that Gemini is trained on twice as many tokens as GPT4 are correct.

In addition, the partnership between the Google DeepMind and Brain teams this year can’t be underestimated, as it puts OpenAI head-to-head with a team of world-class AI researchers, including Google co-founder Sergey Brin and DeepMind senior AI scientist and machine learning expert Paul Barham.

This is an experienced team that has a deep understanding of how to apply techniques like reinforcement learning and tree search to create AI programs that can gather feedback and improve their problem-solving over time, which the DeepMind team used to teach AlphaGo to defeat a Go world champion in 2016.

Competitors Are Interested In Gemini’s Performance

The OpenAI CEO tweeted what appeared to be a response to a paywalled-article reporting that Google Gemini could outperform GPT-4.

There was no official response to the follow-up question by Elon Musk on whether the numbers provided by SemiAnalysis were correct.

Selected Companies Have Early Access To Gemini

More clues about Gemini’s progress this week: The Information reported that Google gave a small group of developers outside Google early access to Gemini.

This suggests Gemini may soon be ready for a beta release and integration into services like Google Cloud Vertex AI.

Meta Working On LLM To Compete With OpenAI

While the news about Gemini is promising thus far, Google isn’t the only company reportedly ready to launch a new LLM to compete with OpenAI.

According to the Wall Street Journal, Meta is also working on an AI model that would compete with the GPT model that powers ChatGPT.

Meta most recently announced the release of Llama 2, an open-source AI model, in partnership with Microsoft. The company appears dedicated to responsibly creating AI that is more accessible.

The AI Arms Race

Gemini’s combination of multi-modal abilities, use of reinforcement learning, text and image generation capabilities, and Google’s proprietary data are all the ingredients that Gemini needs to outperform GPT-4.

The training data is the key differentiator, after all, the organization that wins the LLM arms race will largely be decided based on who trains their models on the largest, richest data set.

That being said, with OpenAI reportedly working on a new next-generation multimodal LLM called Gobi, we can’t write off the generative AI giant just yet. The question now is, who executes multimodal AI better?

The Countdown To Google Gemini

What we know so far indicates that Gemini could represent a significant advancement in natural language processing.

The fusion of DeepMind’s latest AI research with Google’s vast computational resources makes the potential impact challenging to overstate.

If Gemini lives up to expectations, it could drive a change in interactive AI, aligning with Google’s ambitions to “bring AI in responsible ways to billions of people.”

The latest news from Meta and Google comes a few days after the first AI Insight Forum, where tech CEOs privately met with a portion of the United States Senate to discuss the future of AI.