Utilizziamo cookie tecnici per personalizzare il sito web e offrire all’utente un servizio di maggior valore. Chiudendo il banner e continuando con la navigazione verranno installati nel Suo dispositivo i cookie tecnici necessari ai fini della navigazione nel Sito. L’installazione dei cookie tecnici non richiede alcun consenso da parte Sua. Ulteriori informazioni sono contenute nella nostra Cookie Policy.



Generative AI and web scraping: the indications from the Italian Data Protection Authority

PrintMailRate-it

​​​​​​​​​​​​​​​​​​​published on 17 June 2024 | reading time approx. 3 minutes


The Italian Data Protection Authority (Garante Italiano per la Protezione dei Dati Personali) has recently issued its information note on the web scraping aimed at the training of Generative Artificial Intelligence models. 

Web scraping involves the extensive and indiscriminate collection of data, including personal data, using various techniques such as web crawling1​.This activity is paired with the storage and preservation of data collected by web robots (bots) for subsequent targeted analysis, processing, and usage.

In recent years, this technique has gained prominence due to th​e evolution and optimization of Generative AI systems, which are trained on data scraped from the web. Regarding the personal data involved in this activity, many companies justify its processing based on legitimate interest as a legal basis.

While the Italian Data Protection Authority (Garante) continues to investigate this matter, particularly concerning OpenAI and the legal bases for model training, some platforms are already leveraging this legal basis. Therefore, the Authority has issued an information note in the measure of 20 May 2024 providing guidance for data controllers who make personal data publicly available, thereby exposing it to potential web scraping by third parties.

The first suggestion from the Authority is the creation of account areas, accessible only via prior registration, making data unavailable to the public. This measure should, nonetheless, infringe the minimization principle (art. 5 GDPR): the data controllers (platforms, websites, companies, etc.) should not exceed the necessity of the processing of personal data. (e.g. with reference to those cases requiring registration before the finalization of an online purchase: such measures has been considered unlawful by some Authorities​2​). 

The second possible measure to be adopted could be the integration of the Terms of Conditions of the websites or online platforms, with specific clauses prohibiting the use of web scraping. This measure could act as an ex-post enforcement, allowing the data controllers to object the contractual breach in case the clause of prohibition would be infringed. 

Third recommendation is the monitoring of the HTTP requests received by a website or online platform, allowing them to identify anomalies within inbound and outbound data flows. 

A fourth measure is the action on bots, using: 
  • CAPTCHA verifications;
  • the recurring modification of the HTML markup;
  • the incorporation ofdata within multimedia elements (such as images);
  • actions on the robot.txt files.

This kind of measures recommended by the Authority is not mandatory, and their adoption should be assessed on a case-by-case basis, also considering the availability of technologies, budget and resources of the companies. Also, such measures should not infringe the principle of minimization, thus the data controller should always properly assess the necessity of personal data processing compared to the purpose.  


[1] Meaning the use of programs that systematically scan the web in order to collect data contained in web pages and to index them for ensuring the correct functioning of search engines.
[2] See also ​the​ Finnish provision on this regard​.​​

 DATA PROTECTION BITES

​​​Read all releases »​​

author

Contact Person Picture

Valeria Specchio

Avvocato

Senior Associate

+39 02 6328 841

Invia richiesta

Profilo

Contact Person Picture

Nadia Martini

Avvocato

Partner

+39 02 6328 841

Invia richiesta

Profilo

 RÖDL & PARTNER ITALY

​Discover more about our offices in Italy. Read more ​»
Deutschland Weltweit Search Menu