top of page
Writer's pictureMarie-Avril Roux Steinkühler

Opt-out myths and realities



The 2019 Directive on copyright and related rights in the digital single market (Directive 2019/790) allows companies to carry out data mining operations without having to obtain specific licenses. The quid pro quo is provided by the opt-out possibility for rights holders. This enables them to prevent their works from being used by artificial intelligences.


But at no point does the directive suggest what form this opt-out mechanism should take. It confers a right without specifying the practical conditions. 


Let's be pragmatic: the effectiveness of this mechanism is close to nil.

Materially, robots.txt are the best defence against scrappers, crawlers and other Anglicized joys. As the extension indicates, these are small text files that authorize (or not) the exploration or indexing of a website. They are easily accessible and editable. By way of illustration, the two lines of code below allow you to “prohibit” chatgpt from browsing your site:


User-agent: ChatGPTBot


Disallow: /


However, this robot.txt merely expresses the creator's refusal to have his or her content used as training data. It does not constitute a technical measure to close your site, but simply provides information.


Also, a company operating an LLM (Large language model) system wishing to ignore the absence of consent and override the opt-out can do so and scrape all the site's content.

This problem is exacerbated by illegal downloads. Let's take a concrete example: a major book publisher decides to implement these robots.txt within its various portals, and believes itself to be protected in this way. In reality, these different sites will clearly express their lack of consent to text mining. But what about Toto, who illegally acquires a protected work and then posts it on his personal blog without authorization? Artificial intelligences will access the content and integrate it into their learning data, even though in principle they are forbidden to do so.


Today, there are no satisfactory solutions for implementing this opt-out right conferred by the European directive. This is all the more alarming when you consider the unsupervised learning that underpins almost all AI. By simplifying things as much as possible, the data collected becomes part of a black box. They become almost indistinguishable from any previous data that may have been acquired with the authorization of the rights holders. So, even when it is possible to delimit precisely which data has been used, it is technically impossible to remove it without upsetting the functioning of the AI in question.


Nevertheless, it is advisable to practice opt-out as much as possible, as some LLM systems respect it, and this constitutes both proof and a legal basis for possible legal action.


Waiting for a return to opt-in?


Crédits : Photo de Kehn Hermano

Comments


bottom of page