Google Search’s secret weapon is speed. AI-driven challenger Perplexity has realized that all too well. With help from Cerebras’ AI chips, it has now unveiled Sonar, the “best AI answer engine” on the planet. What does it entail?
Sonar by Perplexity is aimed first and foremost at developers. It offers the ability to generate answers at lightning speed via an API, so it needs to be incorporated into an end product before being usable. In that sense, it is not a direct alternative to Google Search; Perplexity.ai is, however, which benefits from all the breakthroughs Sonar has made. What makes Sonar special in particular is that it has been built from the ground up to run on Cerebras’ capable AI hardware. We’ve previously mentioned these AI chips as an alternative to the all-conquering Nvidia. Now, the promise of extremely fast AI computations is being wrapped up in the new AI service, which may pop up in any other products that utilize search.
Read also: Is Perplexity a preview of online search’s AI-driven future?
The highest speed achieved with Sonar is 1,200 tokens per second, Perplexity states. That’s close to Cerebras’ record, which it says reached 1,500 tokens per second on the DeepSeek-R1 version of Llama 3.3-70B.
Small, nimble, lightning fast
The base Sonar model is also a modified version of Meta’s Llama 3.3-70B. Like most AI-based APIs, it uses the OpenAI format to ensure compatibility with existing code. Four variants are available: Sonar, Sonar Pro, Sonar Reasoning and Sonar Reasoning Pro. The cheapest option is designed for low-cost, fast search results, Pro for in-depth queries and the Reasoning models for the most considered AI answers which may require thinking steps to get to. Reasoning runs on DeepSeek-R1, which has also been optimized to run natively on Cerebras hardware.
The Pro options use scale pricing, so the more API tokens one buys, the more cost-efficient Perplexity’s offering becomes. We assume the four APIs will be frequently combined by other companies. After all, search queries now range from simple to complex, and switching between each model prevents app builders from incurring unnecessary costs by always deploying Reasoning Pro for basic questions, for example.
Promises of fast search engines are anything but new. Years ago, Google already emphasized the importance of near-instant results after a user sends a query. Billions of investments into R&D and infrastructure have been necessary to achieve and maintian Google Search’s current hegemony. And Google’s not sitting still either: AI Overviews have already been introduced in a limited form and with mixed reviews. It’s reasonable to ponder if GenAI is offering a landscape fluid enough for a new candidate to eventually even surpass Google in popularity. Perplexity still has a long way to go, but possesses the right ingredients to have a go at such an endeavor.
Cerebras itself is effectively to Nvidia what Perplexity is in relation to Google: a clear challenger, fueled by a new AI approach. And unlike Bing vs Google or AMD vs Nvidia, the choices made by both Perplexity and Cerebras offer a different path to success than the norm, with new types of search integrations as well as different chip philosophies. The chipmaker, for example, relies on “wafer-scale” chips, the largest semiconductors possible in a practical form, upon which AI models run at high speed. The company does, however, still lack the software ecosystem that Nvidia has built, and that moat is sizable.
Also read: Why Nvidia’s rivals think they have a chance to topple it