NVIDIA Teams Up With Google on Gemma Open Language Models

<p>NVIDIA&comma; in collaboration with Google&comma; has launched optimizations across all NVIDIA AI platforms for <a href&equals;"https&colon;&sol;&sol;blog&period;google&sol;technology&sol;developers&sol;gemma-open-models&sol;">Gemma<&sol;a> — Google’s state-of-the-art new lightweight <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-2b">2 billion<&sol;a>– and <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-7b">7 billion<&sol;a>-parameter open language models that can be run anywhere&comma; reducing costs and speeding innovative work for domain-specific use cases&period;<&sol;p>&NewLine;<p>Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create the Gemini models — with <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;TensorRT-LLM">NVIDIA TensorRT-LLM<&sol;a>&comma; an open-source library for optimizing large language model inference&comma; when running on NVIDIA GPUs in the data center&comma; in the cloud and on PCs with <a href&equals;"https&colon;&sol;&sol;www&period;nvidia&period;com&sol;en-us&sol;geforce&sol;rtx&sol;">NVIDIA RTX<&sol;a> GPUs&period;<&sol;p>&NewLine;<p>This allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally&period;<&sol;p>&NewLine;<p>Developers can also run Gemma on NVIDIA GPUs in the cloud&comma; including on Google Cloud’s A3 instances based on the H100 Tensor Core GPU and soon&comma; NVIDIA’s <a href&equals;"https&colon;&sol;&sol;nvidianews&period;nvidia&period;com&sol;news&sol;nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform">H200 Tensor Core GPUs<&sol;a> — featuring 141GB of HBM3e memory at 4&period;8 terabytes per second — which Google will deploy this year&period;<&sol;p>&NewLine;<p>Enterprise developers can additionally take advantage of NVIDIA’s rich ecosystem of tools — including <a href&equals;"https&colon;&sol;&sol;www&period;nvidia&period;com&sol;en-us&sol;data-center&sol;products&sol;ai-enterprise&sol;">NVIDIA AI Enterprise<&sol;a> with the <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;NeMo">NeMo framework<&sol;a> and <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;TensorRT-LLM">TensorRT-LLM<&sol;a> — to fine-tune Gemma and deploy the optimized model in their production application&period;<&sol;p>&NewLine;<p>Learn more about how <a href&equals;"https&colon;&sol;&sol;developer&period;nvidia&period;com&sol;blog&sol;nvidia-tensorrt-llm-revs-up-inference-for-google-gemma&sol;">TensorRT-LLM is revving up inference for Gemma<&sol;a>&comma; along with additional information for developers&period; This includes several model checkpoints of Gemma and the FP8-quantized version of the model&comma; all optimized with TensorRT-LLM&period;<&sol;p>&NewLine;<p>Experience <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-2b">Gemma 2B<&sol;a> and <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-7b">Gemma 7B<&sol;a> directly from your browser on the NVIDIA AI Playground&period;<&sol;p>&NewLine;<h2><b>Gemma Coming to Chat With RTX<&sol;b><&sol;h2>&NewLine;<p>Adding support for Gemma soon is <a href&equals;"https&colon;&sol;&sol;blogs&period;nvidia&period;com&sol;blog&sol;chat-with-rtx-available-now&sol;">Chat with RTX<&sol;a>&comma; an NVIDIA tech demo that uses <a href&equals;"https&colon;&sol;&sol;blogs&period;nvidia&period;com&sol;blog&sol;what-is-retrieval-augmented-generation&sol;">retrieval-augmented generation<&sol;a> and TensorRT-LLM software to give users generative AI capabilities on their local&comma; RTX-powered Windows PCs&period;<&sol;p>&NewLine;<p>The Chat with RTX lets users personalize a chatbot with their own data by easily connecting local files on a PC to a large language model&period;<&sol;p>&NewLine;<p>Since the model runs locally&comma; it provides results fast&comma; and user data stays on the device&period; Rather than relying on cloud-based LLM services&comma; Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection&period;<&sol;p>&NewLine;

Editor

IBM Acquiring Confluent for $11 Billion

ARMONK, NY -- IBM has agreed to buy Confluent, Inc., the data streaming pioneer, for…

2 days

Marvell Buying Celestial AI for $3.25 Billion+

SANTA CLARA -- Marvell Technology, Inc., a leader in data infrastructure semiconductor solutions, plans to…

2 days

ALM Ventures Debuts $100 Million Fund

MOUNTAIN VIEW -- ALM Ventures has announced the launch of ALM Ventures Fund I, a…

5 days

Brainworks Ventures Launches $50 Million AI-Native Fund

SAN FRANCISCO -- Brainworks Ventures, an AI-native venture capital fund led by DARPA alumnus Dr.…

5 days

OpenAI Hires New Chief Revenue Officer

OpenAI is hiring Slack CEO Denise Dresser as the company's Chief Revenue Officer, overseeing global…

5 days

Teen Charged With Shooting at Westfield Valley Fair Mall

The Santa Clara County District Attorney’s Office has charged a San Jose 17-year-old with attempted…

5 days