Since January 2024, Wikimedia has seen a 50 percent increase in Wikipedia’s bandwidth usage. However, this enormous increase is not due to human users suddenly reading Wikipedia articles or watching videos but to AI crawlers that automatically scrape content to train AI models. This creates new challenges for the foundation.
The sudden increase in traffic from AI bots can slow down access to Wikipedia pages and files, especially during events that attract a lot of attention. For example, there were delays in loading pages when Jimmy Carter died in December, and many people were interested in the video of his presidential debate with Ronald Reagan.
Read also: Wikidata unlocks its own knowledge base by vectorizing data
Difference between human traffic and AI crawlers
Wikimedia is well equipped to handle peaks in traffic from human visitors, but the amount of traffic generated by AI scraper bots is unprecedented. Wikimedia says it poses growing risks and additional costs. This is due to a fundamental difference in usage patterns.
Human visitors often search for specific and similar topics. When something is trending, many people look at the same content. Wikimedia, therefore, makes a cache of frequently requested content in the nearest data center, which speeds up loading. However, articles and content that have not been viewed for a long time must be delivered from a central database without an extensive cache, which requires more resources and, therefore, costs more.
Unlike humans, AI crawlers read pages in bulk, including obscure content that must be retrieved from the central database. Research has shown that 65 percent of the resource-intensive traffic comes from bots.
Constant disruptions
This is already causing constant disruptions for Wikimedia’s Site Reliability team. They have to constantly block crawlers before they significantly slow down access for real users. However, as Wikimedia argues, the real problem is that growth is taking place without any acknowledgement of sources or an increase in new human users who want to participate in the Wikipedia community.
A foundation that depends on donations to continue functioning must attract new users and ensure their involvement. “Our content is free, but our infrastructure is not,” the foundation says.
In search of a sustainable future
Wikimedia is now looking for sustainable ways for developers and users to access the content. This is necessary because Wikimedia does not expect AI-related traffic to decrease.
The situation highlights a more significant issue in the AI industry: how to deal with the large-scale scraping of publicly available content for training commercial AI models. While Wikimedia’s mission revolves around the distribution of free knowledge, the current way in which AI companies use the content without sufficient acknowledgement is increasingly conflicting with the platform’s sustainability. The best solution would be for AI organizations to start paying to access Wikipedia content. The question is whether they are willing to do so.