Wikimedia is dealing with a 50 percent increase in bandwidth due to AI crawlers

Since January 2024, Wikimedia has seen a 50 percent increase in Wikipedia’s bandwidth usage. However, this enormous increase is not due to human users suddenly reading Wikipedia articles or watching videos but to AI crawlers that automatically scrape content to train AI models. This creates new challenges for the foundation.

The sudden increase in traffic from AI bots can slow down access to Wikipedia pages and files, especially during events that attract a lot of attention. For example, there were delays in loading pages when Jimmy Carter died in December, and many people were interested in the video of his presidential debate with Ronald Reagan.

Difference between human traffic and AI crawlers

Wikimedia is well equipped to handle peaks in traffic from human visitors, but the amount of traffic generated by AI scraper bots is unprecedented. Wikimedia says it poses growing risks and additional costs. This is due to a fundamental difference in usage patterns.

Human visitors often search for specific and similar topics. When something is trending, many people look at the same content. Wikimedia, therefore, makes a cache of frequently requested content in the nearest data center, which speeds up loading. However, articles and content that have not been viewed for a long time must be delivered from a central database without an extensive cache, which requires more resources and, therefore, costs more.

Unlike humans, AI crawlers read pages in bulk, including obscure content that must be retrieved from the central database. Research has shown that 65 percent of the resource-intensive traffic comes from bots.

Constant disruptions

This is already causing constant disruptions for Wikimedia’s Site Reliability team. They have to constantly block crawlers before they significantly slow down access for real users. However, as Wikimedia argues, the real problem is that growth is taking place without any acknowledgement of sources or an increase in new human users who want to participate in the Wikipedia community.

A foundation that depends on donations to continue functioning must attract new users and ensure their involvement. “Our content is free, but our infrastructure is not,” the foundation says.

In search of a sustainable future

Wikimedia is now looking for sustainable ways for developers and users to access the content. This is necessary because Wikimedia does not expect AI-related traffic to decrease.

The situation highlights a more significant issue in the AI industry: how to deal with the large-scale scraping of publicly available content for training commercial AI models. While Wikimedia’s mission revolves around the distribution of free knowledge, the current way in which AI companies use the content without sufficient acknowledgement is increasingly conflicting with the platform’s sustainability. The best solution would be for AI organizations to start paying to access Wikipedia content. The question is whether they are willing to do so.

What is Retrieval-Augmented Generation?

As organizations deploy generative AI at an ever-increasing rate, its...

Amazon announces new EC2 instances based on Graviton4

Amazon Elastic Compute Cloud (EC2) is getting three new instance families powered by the latest AWS Graviton4...

Mels Dees April 24, 2025

Top story

How the Model Context Protocol has taken the AI world by storm

When Anthropic made the Model Context Protocol (MCP) open source in November, it probably didn't realize how ...

Erik van Klinken 3 days ago

Tech career

Whitepapers

Wikimedia is dealing with a 50 percent increase in bandwidth due to AI crawlers

Difference between human traffic and AI crawlers

Constant disruptions

In search of a sustainable future

Stay tuned, subscribe!

How the Model Context Protocol has taken the AI world by storm

Windows 11 25H2 shows signs of life: what can we expect?

Three decades of Check Point (and cybersecurity): a conversation with Gil Shwed

AI without ethics will never truly serve humanity

How do you innovate for a future you can’t entirely predict?

Devin is the first AI software engineer: should developers be worried?

What is Retrieval-Augmented Generation?

Cloud Account Executive – Slack

AI & Data Architect

Try the latest high-end Synology backup system for free

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Are you data and AI ready?

SAS Innovate 2025

.NEXT 2025

LambdaConf 2025

Qlik Connect 2025

Red Hat Summit

Kaseya DattoCon Europe