News

Cerebras Launches World’s Fastest DeepSeek R1 Inference

SUNNYVALE &&num;8212&semi; <a href="https&colon;//cts&period;businesswire&period;com/ct/CT&quest;id=smartlink&amp&semi;url=https&percnt;3A&percnt;2F&percnt;2Fcerebras&period;ai&percnt;2F&amp&semi;esheet=54196387&amp&semi;newsitemid=20250130028856&amp&semi;lan=en-US&amp&semi;anchor=Cerebras+Systems&amp&semi;index=1&amp&semi;md5=1786cf1b14213323887208c2a4367e19" target="&lowbar;blank" rel="nofollow noopener" shape="rect">Cerebras Systems</a>, a pioneer in accelerating generative AI, announced record-breaking performance for DeepSeek-R1-Distill-Llama-70B inference, achieving more than 1,500 tokens per second – 57 times faster than GPU-based solutions&period; This unprecedented speed enables instant reasoning capabilities for one of the industry&&num;8217&semi;s most sophisticated open-weight models, running entirely on U&period;S&period;-based AI infrastructure with zero data retention&period;&NewLine;&&num;8220&semi;DeepSeek R1 represents a new frontier in AI reasoning capabilities, and today we&&num;8217&semi;re making it accessible at the industry’s fastest speeds,&&num;8221&semi; said Hagay Lupesko, SVP of AI Cloud, Cerebras&period; &&num;8220&semi;By achieving more than 1,500 tokens per second on our Cerebras Inference platform, we&&num;8217&semi;re transforming minutes-long reasoning processes into near-instantaneous responses, fundamentally changing how developers and enterprises can leverage advanced AI models&period;&&num;8221&semi;&NewLine;Powered by the Cerebras Wafer Scale Engine, the platform demonstrates dramatic real-world performance improvements&period; A standard coding prompt that takes 22 seconds on competitive platforms completes in just 1&period;5 seconds on Cerebras – a 15x improvement in time to result&period; This breakthrough enables practical deployment of sophisticated reasoning models that traditionally require extensive computation time&period;&NewLine;DeepSeek-R1-Distill-Llama-70B combines the advanced reasoning capabilities of DeepSeek&&num;8217&semi;s 671B parameter Mixture of Experts (MoE) model with Meta&&num;8217&semi;s widely-supported Llama architecture&period; Despite its efficient 70B parameter size, the model demonstrates superior performance on complex mathematics and coding tasks compared to larger models&period;&NewLine;&&num;8220&semi;Security and privacy are paramount for enterprise AI deployment,&&num;8221&semi; continued Lupesko&period; &&num;8220&semi;By processing all inference requests in U&period;S&period;-based data centers with zero data retention, we&&num;8217&semi;re ensuring that organizations can leverage cutting-edge AI capabilities while maintaining strict data governance standards&period; Data stays in the U&period;S&period; 100&percnt; of the time and belongs solely to the customer&period;&&num;8221&semi;&NewLine;

Editor