<p><strong>SUNNYVALE</strong> &#8212; <a href="https://cts.businesswire.com/ct/CT?id=smartlink&;url=https%3A%2F%2Fcerebras.ai%2F&;esheet=54196387&;newsitemid=20250130028856&;lan=en-US&;anchor=Cerebras+Systems&;index=1&;md5=1786cf1b14213323887208c2a4367e19" target="_blank" rel="nofollow noopener" shape="rect">Cerebras Systems</a>, a pioneer in accelerating generative AI, announced record-breaking performance for DeepSeek-R1-Distill-Llama-70B inference, achieving more than 1,500 tokens per second – 57 times faster than GPU-based solutions. This unprecedented speed enables instant reasoning capabilities for one of the industry&#8217;s most sophisticated open-weight models, running entirely on U.S.-based AI infrastructure with zero data retention.</p>
<p>&#8220;DeepSeek R1 represents a new frontier in AI reasoning capabilities, and today we&#8217;re making it accessible at the industry’s fastest speeds,&#8221; said Hagay Lupesko, SVP of AI Cloud, Cerebras. &#8220;By achieving more than 1,500 tokens per second on our Cerebras Inference platform, we&#8217;re transforming minutes-long reasoning processes into near-instantaneous responses, fundamentally changing how developers and enterprises can leverage advanced AI models.&#8221;</p>
<p>Powered by the Cerebras Wafer Scale Engine, the platform demonstrates dramatic real-world performance improvements. A standard coding prompt that takes 22 seconds on competitive platforms completes in just 1.5 seconds on Cerebras – a 15x improvement in time to result. This breakthrough enables practical deployment of sophisticated reasoning models that traditionally require extensive computation time.</p>
<p>DeepSeek-R1-Distill-Llama-70B combines the advanced reasoning capabilities of DeepSeek&#8217;s 671B parameter Mixture of Experts (MoE) model with Meta&#8217;s widely-supported Llama architecture. Despite its efficient 70B parameter size, the model demonstrates superior performance on complex mathematics and coding tasks compared to larger models.</p>
<p>&#8220;Security and privacy are paramount for enterprise AI deployment,&#8221; continued Lupesko. &#8220;By processing all inference requests in U.S.-based data centers with zero data retention, we&#8217;re ensuring that organizations can leverage cutting-edge AI capabilities while maintaining strict data governance standards. Data stays in the U.S. 100% of the time and belongs solely to the customer.&#8221;</p>

SAN FRANCISCO -- Wispr, the voice-to-text AI that turns speech into clear, polished writing in every…
SAN FRANCISCO -- Numeric, an AI accounting automation platform, has raised a $51 million Series…
Apple has announced 45 finalists for this year’s App Store Awards, recognizing the best apps…
The University of California (UC) and the California Nurses Association (CNA) have reached a tentative…
SAN FRANCISCO -- House Rx, a health tech company focused on making specialty medications more accessible and…
Britain's King has given an award to the King of NVIDIA! NVIDIA founder and CEO…