Should we all get paid for helping train AI?

Compensation or opt-outs should be considered for the creators of data used to train Large Language Models like ChatGPT, an intellectual property expert says.

Bond University law Professor William Van Caenegem said generative AI was essentially mining the intellectual property of anyone who uses the internet to train the models, without offering compensation or even the right to disallow the use of their information for such purposes.

“We are all supplying data to the system and then when we use it, we train it. It will be pushed back onto users as we are asked to start paying for more functions, without any recognition or compensation for contributing to the system,” he said.

“This is one of those problems where everyone is contributing to the system a little bit so it's easy to say, ‘well for you that’s hardly anything’, but when you put it all together it’s an enormous impact.”

He said the regulatory structures that traditionally apply to data and information hold that information is free and cannot be ‘owned’ by anyone, so no one can be compensated for its use. Copyright law relates to how the information is presented by an author in a published work, which is more limited and deserves recognition.

“This balance has been important to ensure the free flow of information and knowledge,” he said.

“But to me it’s a question of whether it is all right that what has now become a megacompany like OpenAI with millions of users and backed by giants such as Microsoft will potentially be a profit-driver by using all that data without me ever having a say about it or having any kind of compensation for it at all.

“Should it become a social good in that it becomes freely available for everyone to mine, and is it fair to use people’s outputs without asking?

“That’s where, I think, there is a good argument for subjecting it to some control, some regulation, some form of triage and compensation or at least giving people a say.

“At the most basic level you could just give people an opportunity to signal that they object to their output being used for that purpose.”

But it’s unclear what form such regulations would take, or even which aspect of the law they might fall under. The EU Parliament has adopted legislation that restricts AI applications based on their potential risks to consumers and the Australian government is considering new laws and mandatory standards. But regulation is in its infancy, its precise scope unknown.

“It’s not even clear what sort of existing regulatory regimes might be engaged by the use of people’s outputs: privacy, digital safety, copyright, media law? It’s so new and generative AI scrapes and digests information in so many different forms. There’s a big question of who owns what data it feeds on and the data it produces,” he said.

“It’s not just privacy and safety regulation, we do need that but there is a case for people to have more control over what they generate that is scraped by these products and used, and for some recognition of value.

“We’ve already seen big platforms required to pay compensation for use of media content that strictly doesn’t infringe copyright, through the EU’s Directive on Copyright in the Single Digital Market, so there’s an argument that generative AI companies could be forced to provide compensation for everyone’s material, and for everyone’s help in improving their product.”

He said the centuries-old principles of copyright law based on the 1710 Statute of Anne have withstood the advent of every new technology, from the printing press to social media but may no longer offer adequate protection.

“It’s a common refrain in intellectual property when there is a new technology - ‘the law isn't able to keep up’ - but most of the time that’s not true,” he said.

“The law around intellectual property is actually a relatively flexible system. It’s quite conceptual and doesn’t focus on a particular technology and regulate that, it has broad principles that it applies.

“So I’m usually a bit of a sceptic about the idea that there’s a need to rewrite laws. The law does take time, but I think it’s pretty flexible, and often technology-based criticism misunderstands how the law can work.

“But with open AI and what we are seeing … it does lead you to question whether IP is really doing enough.”

Should we all get paid for helping train AI?

More from Bond

Bond graduate bound for Oxford

Bond subject scoops Green Gown Award

Top honours for Bond’s brightest minds

Kids are eating too much but they're still malnourished

Spring clean your life