NVIDIA Teams Up With Google on Gemma Open Language Models

<p>NVIDIA&comma; in collaboration with Google&comma; has launched optimizations across all NVIDIA AI platforms for <a href&equals;"https&colon;&sol;&sol;blog&period;google&sol;technology&sol;developers&sol;gemma-open-models&sol;">Gemma<&sol;a> — Google’s state-of-the-art new lightweight <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-2b">2 billion<&sol;a>– and <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-7b">7 billion<&sol;a>-parameter open language models that can be run anywhere&comma; reducing costs and speeding innovative work for domain-specific use cases&period;<&sol;p>&NewLine;<p>Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create the Gemini models — with <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;TensorRT-LLM">NVIDIA TensorRT-LLM<&sol;a>&comma; an open-source library for optimizing large language model inference&comma; when running on NVIDIA GPUs in the data center&comma; in the cloud and on PCs with <a href&equals;"https&colon;&sol;&sol;www&period;nvidia&period;com&sol;en-us&sol;geforce&sol;rtx&sol;">NVIDIA RTX<&sol;a> GPUs&period;<&sol;p>&NewLine;<p>This allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally&period;<&sol;p>&NewLine;<p>Developers can also run Gemma on NVIDIA GPUs in the cloud&comma; including on Google Cloud’s A3 instances based on the H100 Tensor Core GPU and soon&comma; NVIDIA’s <a href&equals;"https&colon;&sol;&sol;nvidianews&period;nvidia&period;com&sol;news&sol;nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform">H200 Tensor Core GPUs<&sol;a> — featuring 141GB of HBM3e memory at 4&period;8 terabytes per second — which Google will deploy this year&period;<&sol;p>&NewLine;<p>Enterprise developers can additionally take advantage of NVIDIA’s rich ecosystem of tools — including <a href&equals;"https&colon;&sol;&sol;www&period;nvidia&period;com&sol;en-us&sol;data-center&sol;products&sol;ai-enterprise&sol;">NVIDIA AI Enterprise<&sol;a> with the <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;NeMo">NeMo framework<&sol;a> and <a href&equals;"https&colon;&sol;&sol;github&period;com&sol;NVIDIA&sol;TensorRT-LLM">TensorRT-LLM<&sol;a> — to fine-tune Gemma and deploy the optimized model in their production application&period;<&sol;p>&NewLine;<p>Learn more about how <a href&equals;"https&colon;&sol;&sol;developer&period;nvidia&period;com&sol;blog&sol;nvidia-tensorrt-llm-revs-up-inference-for-google-gemma&sol;">TensorRT-LLM is revving up inference for Gemma<&sol;a>&comma; along with additional information for developers&period; This includes several model checkpoints of Gemma and the FP8-quantized version of the model&comma; all optimized with TensorRT-LLM&period;<&sol;p>&NewLine;<p>Experience <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-2b">Gemma 2B<&sol;a> and <a href&equals;"https&colon;&sol;&sol;catalog&period;ngc&period;nvidia&period;com&sol;orgs&sol;nvidia&sol;teams&sol;ai-foundation&sol;models&sol;gemma-7b">Gemma 7B<&sol;a> directly from your browser on the NVIDIA AI Playground&period;<&sol;p>&NewLine;<h2><b>Gemma Coming to Chat With RTX<&sol;b><&sol;h2>&NewLine;<p>Adding support for Gemma soon is <a href&equals;"https&colon;&sol;&sol;blogs&period;nvidia&period;com&sol;blog&sol;chat-with-rtx-available-now&sol;">Chat with RTX<&sol;a>&comma; an NVIDIA tech demo that uses <a href&equals;"https&colon;&sol;&sol;blogs&period;nvidia&period;com&sol;blog&sol;what-is-retrieval-augmented-generation&sol;">retrieval-augmented generation<&sol;a> and TensorRT-LLM software to give users generative AI capabilities on their local&comma; RTX-powered Windows PCs&period;<&sol;p>&NewLine;<p>The Chat with RTX lets users personalize a chatbot with their own data by easily connecting local files on a PC to a large language model&period;<&sol;p>&NewLine;<p>Since the model runs locally&comma; it provides results fast&comma; and user data stays on the device&period; Rather than relying on cloud-based LLM services&comma; Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection&period;<&sol;p>&NewLine;

Editor

MatX Raises $500 Million Series B

MOUNTAIN VIEW -- MatX, a company developing new chips for AI inference, has raised a…

10 hours

Jest Emerges From Stealth With $7 Million

SAN FRANCISCO -- Jest, a company building the world’s first marketplace for messaging games,  has…

11 hours

Doordash Uses AI to Improve Pizza Ordering

Pizza is one of the most popular categories on DoorDash with more than 150 million…

4 days

Fieldguide Scores $75 Million Series C

SAN FRANCISCO — Fieldguide, an agentic AI-native platform for the audit and advisory industry, has…

4 days

JetStream Security Takes Off With $34 Million Seed Funding

SANTA CLARA -- JetStream Security has raised $34 million in seed funding to solve what…

6 days

Broadcom Reports $19.3 Billion in First Quarter Revenues

PALO ALTO -- Broadcom Inc., a designer and developer of semiconductor and infrastructure software solutions,…

6 days