2 min Applications

New method lowers cost of reasoning models

New method lowers cost of reasoning models

Reasoning models already exist, but they have an unfavorable effect. In fact, because problems are broken up and handled in separate blocks, the cost of these models quickly increases. Researchers have found a new method to impose budget constraints on the model without sacrificing quality.

Researchers at the American Carnegie Mellon University have found a new technique for reducing the cost of reasoning models. However, the method must be applied during model development.

LCPO

Developers of AI models can use the technique of length controlled policy optimization (LCPO) to reduce the thoughts of these LLMs. The strength of reasoning models does, however, lie in thinking longer and treating different parts of the information separately, so this research sounds counterproductive. According to the researchers, the level of response does not drop back to LLMs who skip the reasoning step.

The LLM’s thought is constrained by giving the model a maximum number of tokens to find the answer. A correct answer, but using too many tokens results in a penalty. The model must then create a new reasoning plan that fits within the given number of tokens.

For the study, the models L1-max and L1-exact were created. These reasoning models contain 1.5 billion parameters. “To our knowledge, this is the first demonstration that a 1.5B model can outperform frontier models such as GPT-4o, despite using the same generation length,” the researchers write. The gain is two percent.

Tip! Claude 3.7 Sonnet offers as much reasoning as you want