NVIDIA Teams Up With Google on Gemma Open Language Models

<p>NVIDIA&comma; in collaboration with Google&comma; has launched optimizations across all NVIDIA AI platforms for <a href&equals;"https&colon;&sol;&sol;blog&period;google&sol;technology&sol;developers&sol;gemma-open-models&sol;">Gemma<&sol;a> — Google’s state-of-the-art new lightweight <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-2b">2 billion<&sol;a>– and <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-7b">7 billion<&sol;a>-parameter open language models that can be run anywhere&comma; reducing costs and speeding innovative work for domain-specific use cases&period;<&sol;p>&NewLine;<p>Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create the Gemini models — with <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;TensorRT-LLM">NVIDIA TensorRT-LLM<&sol;a>&comma; an open-source library for optimizing large language model inference&comma; when running on NVIDIA GPUs in the data center&comma; in the cloud and on PCs with <a href&equals;"https&colon;&sol;&sol;www&period;nvidia&period;com&sol;en-us&sol;geforce&sol;rtx&sol;">NVIDIA RTX<&sol;a> GPUs&period;<&sol;p>&NewLine;<p>This allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally&period;<&sol;p>&NewLine;<p>Developers can also run Gemma on NVIDIA GPUs in the cloud&comma; including on Google Cloud’s A3 instances based on the H100 Tensor Core GPU and soon&comma; NVIDIA’s <a href&equals;"https&colon;&sol;&sol;nvidianews&period;nvidia&period;com&sol;news&sol;nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform">H200 Tensor Core GPUs<&sol;a> — featuring 141GB of HBM3e memory at 4&period;8 terabytes per second — which Google will deploy this year&period;<&sol;p>&NewLine;<p>Enterprise developers can additionally take advantage of NVIDIA’s rich ecosystem of tools — including <a href&equals;"https&colon;&sol;&sol;www&period;nvidia&period;com&sol;en-us&sol;data-center&sol;products&sol;ai-enterprise&sol;">NVIDIA AI Enterprise<&sol;a> with the <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;NeMo">NeMo framework<&sol;a> and <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;TensorRT-LLM">TensorRT-LLM<&sol;a> — to fine-tune Gemma and deploy the optimized model in their production application&period;<&sol;p>&NewLine;<p>Learn more about how <a href&equals;"https&colon;&sol;&sol;developer&period;nvidia&period;com&sol;blog&sol;nvidia-tensorrt-llm-revs-up-inference-for-google-gemma&sol;">TensorRT-LLM is revving up inference for Gemma<&sol;a>&comma; along with additional information for developers&period; This includes several model checkpoints of Gemma and the FP8-quantized version of the model&comma; all optimized with TensorRT-LLM&period;<&sol;p>&NewLine;<p>Experience <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-2b">Gemma 2B<&sol;a> and <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-7b">Gemma 7B<&sol;a> directly from your browser on the NVIDIA AI Playground&period;<&sol;p>&NewLine;<h2><b>Gemma Coming to Chat With RTX<&sol;b><&sol;h2>&NewLine;<p>Adding support for Gemma soon is <a href&equals;"https&colon;&sol;&sol;blogs&period;nvidia&period;com&sol;blog&sol;chat-with-rtx-available-now&sol;">Chat with RTX<&sol;a>&comma; an NVIDIA tech demo that uses <a href&equals;"https&colon;&sol;&sol;blogs&period;nvidia&period;com&sol;blog&sol;what-is-retrieval-augmented-generation&sol;">retrieval-augmented generation<&sol;a> and TensorRT-LLM software to give users generative AI capabilities on their local&comma; RTX-powered Windows PCs&period;<&sol;p>&NewLine;<p>The Chat with RTX lets users personalize a chatbot with their own data by easily connecting local files on a PC to a large language model&period;<&sol;p>&NewLine;<p>Since the model runs locally&comma; it provides results fast&comma; and user data stays on the device&period; Rather than relying on cloud-based LLM services&comma; Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection&period;<&sol;p>&NewLine;

Editor

Wispr Scores $25 Million Series A Extension

SAN FRANCISCO -- Wispr, the voice-to-text AI that turns speech into clear, polished writing in every…

1 day

Numeric Dials Up $51 Million Series B

SAN FRANCISCO -- Numeric, an AI accounting automation platform, has raised a $51 million Series…

1 day

Apple Names 45 Finalists for App Store of the Year Awards

Apple has announced 45 finalists for this year’s App Store Awards, recognizing the best apps…

2 days

UC Reaches Agreement With Nurses, Strike Canceled

The University of California (UC) and the California Nurses Association (CNA) have reached a tentative…

4 days

HouseRX Rakes In $55 Million Series B

SAN FRANCISCO -- House Rx, a health tech company focused on making specialty medications more accessible and…

4 days

King Charles Honors NVIDIA’s Jensen Huang

Britain's King has given an award to the King of NVIDIA! NVIDIA founder and CEO…

4 days