AI training over data privacy: Microsoft and Zoom get caught

AI training over data privacy: Microsoft and Zoom get caught

The Software Freedom Conservancy (SFC) is calling to abandon the video platform Zoom. The reason for this is the sneaky tweaks in user agreements surrounding the use of data to train AI. It seems to be the beginning of a slight trend since Microsoft is also guilty of the same facts.

SFC is asking all open-source developers to boycott Zoom and is helping users switch to an alternative. The video platform came into disrepute last week due to changes in its terms of service. It seemed Zoom tried to get more user data into its training model for AI.

Confusing wording in the agreement made it appear that the video platform placed little value on users’ consent to train AI with their data. Even without consent, data would still end up in the training set.

Clarifications to get things straight

The video platform eventually responded to the facts in an official blog. In it, it claimed that the section was only written to provide additional services without asking for additional permission from the user. This includes, for example, obtaining an audio recording of the meeting to elaborate into a summary. It further adds: “For AI, we do not use audio, video or chat content to train our models without client consent.”

The relevant section 10.4 was given a completely different interpretation, leaving only the section that deals with AI training by consent still contains that Zoom acquires “a perpetual, worldwide, non-exclusive, royalty-free, sublicensable and transferable license and all other rights required or necessary for the permitted use” by consent.

Terms are often muddled and long

However, the harm was already done. The terms of service sowed confusion, but they are often too long to check. After all, it would take a lot of time to read updated terms of service from each party. That time is usually not there for business users. This story proved that once again, as the initial updates date back to March. However, it took until last week for the company to receive negative feedback on them.

Zoom, by the way, is not the only tech player tinkering with its terms of service. Microsoft also recently published a new AI Services paragraph stating that it may, for example, store call data for later use. The new terms will take effect Sept. 30. Only Bing Enterprise Chat will stay away from user data.

So as a user, your options for keeping your data out of the training model are limited. In the case of the video platform Zoom, users do not have full control themselves either, as the permission of one IT administrator is already sufficient permission to keep track of employee data from the entire company.

Europe brings a small, bright spot

In any case, it looks like tech companies are not that serious about data privacy. Training the models behind generative AI seems more important than respecting users’ privacy. Users who don’t want to share their data only have the option of simply ignoring the tools.

Yet there is a positive European element to this story. It is the case that privacy laws protect European users. Therefore, Google’s change to its privacy policy regarding training AI models was not passed on to the European version. Its chatbot was also only allowed to launch in Europe, along with a privacy hub. In it, it is possible to indicate that chatbot Bard is not allowed to train itself with your data.

Although the tech company remains in control here, your data does get included in the training set by default. So although the agreement handles your data, you must let Google know that you want to keep this data private. In the ideal world, of course, every company with an AI tool asks users themselves for permission to include their data.

AI developers pluck from the Internet

But we don’t live in an ideal world. Large language models have long been trained from data freely available on the Internet. ChatGPT, for example, did not come out of nowhere, but the developers behind the model plucked information right off the Internet, even if this information was copyrighted.

Also read: ‘ChatGPT based on illegal sites, private data and piracy’

As the first products based on these models emerge, tech companies see new opportunities to train AI. No longer just on what is publicly available but on all the input users give to the tools. New terms of service are needed for this, whereas the earlier models trained on what was publicly available without asking any permission. Therefore, training AI has got the sweet spot over data privacy for a longer period.