Pruna AI makes compression framework open source

Pruna AI, a European start-up that focuses on compression algorithms for AI models, has made its optimization framework open source.

Pruna AI has built a framework that applies various efficiency methods to AI models, such as caching, pruning, quantization and distillation. The company also standardizes the saving and loading of compressed models, the combining of these compression methods and the evaluation of compressed models after compression. John Rachwan, co-founder and CTO of Pruna AI, told TechCrunch that this framework helps developers to streamline these processes.

The framework is also able to assess whether there is a significant loss of quality after compression, and it shows the performance improvements achieved as a result.

Larger AI laboratories are already using various compression methods. For example, OpenAI uses distillation to develop faster versions of its leading models. It is likely that GPT-4 Turbo was created in this way as a faster version of GPT-4. Similarly, the Flux.1 fast image generation model is a distilled version of the Flux.1 model from Black Forest Labs.

Teacher-student model

Distillation is a technique in which knowledge is extracted from a large AI model by means of a teacher-student model. In this process, developers send requests to the teacher model and record the results. These responses are sometimes compared to a dataset to measure accuracy. The student model is then trained to approximate the behavior of the teacher.

Rachwan noted that large companies usually develop these types of solutions internally. According to him, what you find in the open-source world is usually based on a few methods, such as one quantification method for LLMs or one caching method for diffusion models. He added that it is difficult to find a tool that brings all these methods together, makes them easy to use and combines them, and that this is exactly the great added value that Pruna now offers.

Image and video generation in particular

Although Pruna AI supports every type of model, from large language models to diffusion, speech-to-text and computer vision models, the company is currently focusing specifically on image and video generation. Pruna AI’s existing customers include Scenario and PhotoRoom. In addition to the open-source version, the company offers an enterprise edition with advanced optimization features, including an optimization agent.

One feature that Pruna will be launching soon is the compression agent, Rachwan says. This agent allows users to specify their model and a specific performance requirement, such as increasing speed without decreasing accuracy by more than 2%. The agent then does the work by finding the best combination of compression methods. This saves developers the trouble of manual optimization.

Pruna AI charges an hourly rate for the pro version, comparable to renting a GPU on a cloud service such as AWS, according to Rachwan. The optimized model can significantly reduce inference costs, especially if the model is a crucial part of the AI infrastructure. Rachwan also shared an example in which Pruna AI made an Llama model eight times smaller without too much loss of quality using its compression framework. The company hopes that customers will see the compression framework as an investment that pays for itself.

A few months ago, Pruna AI raised seed investment of $6.5 million. Investors in the start-up include EQT Ventures, Daphni, Motier Ventures and Kima Ventures.

Better real-time data processing with new AI features in Confluent

Confluent is further expanding the AI and analytics capabilities of Confluent Cloud for Apache Flink and Tabl...

Floris Hulshoff Pol March 19, 2025

IBM strengthens AI portfolio with acquisition of DataStax

IBM announced the acquisition of DataStax, a provider of AI and data technologies. DataStax's technology stre...

Mels Dees February 26, 2025

Top story

MinIO goes big on Small Language Models

Historically, Small Language Models (SLMs) can be said to be following the same evolutionary pattern as micro...

Adrian Bridgwater April 8, 2025

Tech career

Whitepapers

Pruna AI makes compression framework open source

Teacher-student model

Image and video generation in particular

Stay tuned, subscribe!

OpenAI launches GPT-4.1 models, with a focus on coding

Tech stocks jump after Trump tariff cuts

Transparency in AI reasoning process falls short

Atlassian’s System of Work vision takes shape with Teamwork Collection and Rovo AI Agents

The Techzine Perspective: Atlassian Team ’25 highlights power of integration

Atlassian’s System of Work vision takes shape with Teamwork Collection and Rovo AI Agents

HubSpot: ‘Observe AI Agents not as work tools, but as your colleagues’

AI agents need Process Intelligence

AI & Data Architect

Cloud Account Executive – Slack

Try the latest high-end Synology backup system for free

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Are you data and AI ready?

VeeamON 2025

GITEX ASIA

SAS Innovate 2025

.NEXT 2025

LambdaConf 2025

Qlik Connect 2025