The Allen Institute for AI (Ai2) has launched OLMoTrace. This allows researchers and developers to see which sources a language model uses to generate text. This should make understanding how the AI arrives at its output easier.
Ai2 claims that OLMoTrace is the first function of its kind. Users can trace the output of language models directly back to the original training documents. This functionality is available for Ai2’s flagship open model, OLMo 2 32B, which has been trained on a dataset of more than 3.2 billion documents.
“OLMoTrace marks a pivotal step forward for the future of AI development, laying the foundation for more, transparent AI systems that researchers and developers can better understand,” said Jiacheng Liu,
lead researcher for OLMoTrace. “By offering greater insight into how AI models generate their responses,
anyone using our models can ensure that the data supporting their outputs is trustworthy and verifiable.
Easy to use, powerful results
After generating an answer from OLMo in the Ai2 Playground, users can activate the tool by clicking the “Show OLMoTrace” button below the output. The tool then searches through all 3.2 billion documents from the model’s training process and highlights text fragments in the answer that correspond to the training material.
Each highlighted fragment is linked to a set of documents, allowing users to explore the original sources. They can see where and in what context these sentences appeared. The tool prioritizes the most unique and relevant fragments, with matches ranked by relevance score.
OLMoTrace is now available on the Ai2 Playground for the OLMo 2 32B Instruct, OLMo 2 13B Instruct and OLMoE 1B 7B Instruct models. The service is hosted by Google Cloud.