Google Gemini: All You Need To Know About The New AI

Image Credits: Google Blog

Google has recently unveiled its latest generative AI model, Gemini, which the tech giant deems its most capable and versatile AI to date. This large language model (LLM) is multimodal, demonstrating an ability to comprehend various types of information, including text, audio, images, and video. Google plans to expand Gemini next year, offering three distinct models to cater to different user needs.

Google’s Gemini stands out as the initial model to surpass human experts in Massive Multitask Language Understanding (MMLU). Given its proficiency across different domains, Gemini is positioned to transform our interactions with artificial intelligence significantly.

Gemini will be available in three versions:

Gemini Ultra: The largest and most capable model designed for highly complex tasks.

Gemini Pro: Geared towards a broad range of tasks, outperforming OpenAI’s GPT-3.5 in six out of eight industry benchmarks.

Gemini Nano: Targeted at Android users who wish to build Gemini-powered apps, allowing functionalities such as summarizing recordings made using the Recorder app on Pixel 8 Pro phone (currently available only in English).

Google’s move with Gemini is seen as a response to OpenAI’s GPT models, particularly GPT-3.5. In evaluations, Gemini Pro surpassed GPT-3.5 in most benchmarks, while the advanced Gemini Ultra outperformed the newer GPT-4 in seven out of eight benchmarks.

Gemini has broader implications across Google’s services. It is integrated into Bard, Google’s AI chatbot, providing advanced reasoning and understanding. Bard, backed by Gemini Pro, is currently available only in English across more than 170 countries. Integration with Gemini Ultra is planned for the future, and Gemini will be gradually incorporated into other Google applications, including Search, Google Ads, and the Chrome browser.

Some of the unique features of Google Gemini are:

Processing Complex Information: Gemini can comprehend complex written and visual data, sifting through extensive datasets to derive meaningful insights.

Comprehensive Multimodal Comprehension: Gemini exhibits the ability to identify and grasp images, audio, text, and more simultaneously, enabling it to seamlessly analyze diverse forms of data.

Advanced Coding Proficiency: Gemini can interpret, generate, and elucidate high-quality code in popular programming languages, demonstrating excellence in various coding benchmarks.

Exclusive Infrastructure and Velocity: Gemini has undergone extensive training using AI-optimized infrastructure and exclusive Tensor Processing Units (TPUs). Its performance is further enhanced when operating on TPUs, and Google is introducing a novel TPU system to expedite its ongoing development.

Ethics and Access Regarding Gemini

To harness the functionalities of Google Gemini, one can employ the API offered by Google AI Studio or access it via Google Cloud Vertex. Here are four key points concerning the ethics and access related to Gemini:

Ethical Considerations:

Google adheres to specific AI principles to guarantee the safety, ethics, and security of Gemini. Rigorous safety assessments are conducted to identify and address bias and toxicity, and collaboration with experts is undertaken to subject the model to thorough testing.

Research and Development:

Google actively engages in research concerning potential risk areas such as persuasion and autonomy, aiming to bolster the ethical dimensions of Gemini. Benchmarks like Real Toxicity Prompts are utilized to diagnose content safety issues during the model’s training.

Availability:

Gemini 1.0 is progressively being integrated across various products and platforms. The Pro version of Gemini is already accessible in Bard, Google’s counterpart to ChatGPT, with plans to introduce Bard Advanced in the future.

Access Points:

Developers have the option to access Gemini through the API provided by Google AI Studio or via Google Cloud Vertex. This facilitates the integration of Gemini’s capabilities into their own AI solutions.

Google is steadfast in its commitment to establishing a prominent position in the AI landscape with Gemini and has intentions to expand its capabilities in subsequent versions.

Notably, Gemini operates on Google’s tensor processing units (TPUs), specialized hardware for training AI models. Although currently reliant on TPUs, there are plans to include graphics processing units (GPUs), such as Nvidia’s H100 GPU, in Gemini’s training process.

As for the monetization strategy for Gemini, Google is exploring options but has not provided specific details on how it plans to generate revenue from this advanced AI model.

Jyotsna Datta

Jyotsna is a 22 year-old literature graduate who has a passion for writing and editing. As an introvert, the only way she can express her thoughts is through her words on paper, so she holds writing very close to her heart. A lover of fiction, she can get hooked to any book she picks up.