The Gemini family appears to be growing faster than the AI community can keep up with. Within two to three months, Google released Ultra, Nano, and now Gemini 1.5 Pro and killed Bard, replacing it with Gemini Pro.

The new flagship model from Google, the Gemini 1.5 Pro, is more efficient than the previous model, the Gemini Ultra.

In fact, in a few benchmark tests, Gemini 1.5 Pro beats it on Ultra; however, more data is required for a thorough comparison.

 

With a new Mixture-of-Experts (MoE) architecture, Gemini 1.5 Pro beats out Gemini Pro, which is now known as Gemini 1.0 Pro, in 87% of benchmarks.

Although Google only upgraded Gemini Pro a few weeks ago, it is now accessible through Google’s new paid AI platform called Google One AI Premium.

What use does a model that outperforms the 1.0 Pro but resembles the Ultra have then?

The main highlight of Gemini 1.5 Pro, aside from better performance in certain areas and higher computing efficiency compared to Ultra, is its 128,000 token context window, which can be expanded to 1 million tokens. This outperforms Claude 2.1 at 200,000 and GPT-4 Turbo at 128,000.

A one million context window is roughly equivalent to seven hundred thousand words, eleven hours of audio, or one hour of video.

This makes it possible to process and interpret enormous amounts of data, even whole books.Google clarifies that Gemini 1.5 Pro is still a “mid-size” multimodal model with scalable and adaptable features.

Therefore, is Gemini 1.5 a GPT-4 killer? It shouldn’t outperform it in brute-force performance, but as Google was eager to show, it should be able to outperform it for certain tasks involving a lot of data.

Gemini’s applications and capabilities

The capabilities of Gemini 1.5 Pro are multimodal, including text, video, and audio, just like those of its predecessors.

The model can process and reason over enormous amounts of data, including lengthy documents, large codebases, or hours of video content, thanks to its extended context window.

In a Google demo, Gemini 1.5 Pro demonstrated its ability to comprehend and recognize details in the 402-page transcripts pertaining to Apollo 11’s lunar mission.

Finding specific scenes in Buster Keaton’s “Sherlock Jr.” using descriptions and sketches was another challenge, which 1.5 Pro completed even though it occasionally took up to a minute.

Gemini 1.5 Pro was also tasked with translating from English into Kalamang, a sophisticated language spoken in Guinea, and vice versa.

Given that there was no representation of Kalamang in the model’s training set, this was particularly intimidating.

A bilingual wordlist (dictionary) with roughly 2,000 entries, a set of about 400 parallel sentences, and roughly 500 pages of reference grammar were among the instructional materials that Google gave the model in its input context.

These materials fit inside the extended context window of the model, totaling about 250k tokens.

Gemini 1.5 Pro translated sentences between Kalamang and English successfully using only the instructional materials that were provided. This experiment demonstrated the model’s ability to pick up new vocabulary and linguistic rules from the environment and apply them, essentially learning a new language on the fly.

Experts in human language learning evaluated the translation quality generated by Gemini 1.5 Pro by contrasting the model’s output with that of a human language learner using identical resources.

The model’s ability to analyze and resolve issues involving more than 100,000 lines of code was evaluated in a different demo.

Insights from Gemini 1.5 Pro’s research paper

Google released an accompanying research paper on Gemini 1.5, titled “Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context.” 

The extended context window of Gemini 1.5 Pro, which presently outperforms other LLMs at the top of its 1 million token range, is obviously something Google wants to promote.

Gemini 1.5 Pro sets new benchmarks in long-document QA, long-video QA, and long-context ASR, achieving near-perfect recall on long-context retrieval tasks across multiple modalities.

The study compares the performance of the Gemini 1.5 Pro to the Gemini 1.0 models in a number of key areas.

  • Win-rate improvements: Across several benchmarks, Gemini 1.5 Pro demonstrates its improvements with an 87.1% win rate against Gemini 1.0 Pro and a 54.8% win rate against Gemini 1.0 Ultra.
  • Specific area performance: The model outperforms Gemini 1.0 Pro by 100% and Gemini 1.0 Ultra by 77% in text-related tasks.  Regarding tasks involving vision, the respective win rates against Gemini 1.0 Pro and Ultra are 46% and 77%. With regard to audio tasks, the win rate is 60% against Gemini 1.0 Pro and 20% against Gemini 1.0 Ultra.Gemini 1.5 Pro, when compared to competitors, has a longer context window and is a good GPT-3.5 level model overall.

    Will that be sufficient to distract users from ChatGPT? In actuality, the advantages could be minimal to nonexistent unless you have entire books to study.

How to use Gemini 1.5 Pro

A limited preview of Gemini 1.5 is now accessible to enterprise and developer clients.

There are still unanswered questions regarding accessibility and long-term pricing. Google has made reference to different price tiers, ranging from the usual 128,000 tokens to the full 1 million, depending on the size of the context window.

The precise price is still unknown, which has led to conjecture regarding the possible outlay of funds needed to take advantage of this advanced context window.

Some have pointed out that the competition will have advanced by the time Gemini 1.5 Pro becomes available to the general public.

Google sets itself apart with a product that is limited to experimentation by a small number of early adopters. That seems a bit distasteful.

The Gemini family: accessible or esoteric?

Within two to three months, Google released Ultra, Nano, and now Gemini 1.5 Pro and killed Bard, replacing it with Gemini Pro.

Gemini Pro—which was merely Gemini—had to be renamed Gemini 1.0 Pro in order to accomplish this.

As a result of this AI splurge, DeepMind’s landing page for the Gemini family is quite frankly convoluted and crowded. 

In many respects, OpenAI pulled off a cunning marketing ploy by placing their models under the “ChatGPT” banner from the beginning and limiting non-API users’ access to essentially the free GPT-3.5 and the premium GPT-4.

Google is going all out with generative AI through Gemini, but their ever more confusing product lineup may cause them problems down the road.