Data are like photographs – without context, they make no sense

How to Prepare European Research Data for Artificial Intelligence? Professor Ignacio Blanquer, member of the European Open Science Cloud (EOSC) Board and coordinator of the Spanish e-Science network, talks about why Europe needs a federated and secure data environment based on FAIR principles to keep up with global competition. He explains what preparing data for AI involves and the crucial roles of data quality, traceability, and respect for rights. He also shares why the world of photography inspires him when searching for new ideas.

3 Jul 2025 Lucie Skřičková

No description

You shape Europe's research data infrastructure as an EOSC Board of Directors member. How does EOSC support the “data ready for AI” concept, and how crucial are the FAIR principles and reproducibility in this context?

The latest version of the EOSC Strategic Research and Innovation Agenda focuses on “AI for FAIR data” and “FAIR data for AI.” If we want AI systems to produce results that are auditable and aligned with the principles of the AI Act, we need to ensure they’re trained on data that is traceable, high-quality, and reliable. It’s also important to understand whether the models have respected the intellectual property rights and terms of use of the data on which they were built.


Your work bridges high-performance computing, medical imaging, and cloud infrastructure. What do you see as the most significant challenges and opportunities in preparing research data for artificial intelligence applications?

Using sensitive data in AI is a pressing and widely discussed topic. The development of the European Health Data Space (EHDS) regulatory framework will help establish a legal basis for using health data, even without explicit consent. While this will facilitate access to large volumes of health data, significant challenges will remain in sharing and integrating such data across international borders.

Research Infrastructures that have already collected data suitable for secure and confidential international sharing can play a key role in addressing these challenges. Their contribution will be essential for Europe to remain competitive with countries operating under less restrictive regulations, particularly in developing foundational models that could bring about a paradigm shift. This context will be crucial to establishing convenient, secure, efficient, and robust federated infrastructures.


In your projects, you work with sensitive data from medical imaging. Are there established standards and protocols to ensure such data can be safely and effectively used in AI applications?

There are established standards governing how data is obtained, coded, formatted, and transmitted, perhaps even too many. However, there are far fewer standards concerning the harmonisation of unstructured data, such as medical imaging. While medical imaging data is generally well coded, with standard protocols describing acquisition procedures and widely adopted best practices for metadata, subtle variations such as differences in device models or manufacturers, patient conditions, or the operator may be imperceptible to human experts. Still, they can be highly significant for AI modelling. This becomes particularly critical in federated architectures, where data cannot be extensively analysed or centralised. Therefore, increased effort in evaluating and quantifying unstructured data quality and clearly defining provenance and preprocessing workflows will be essential to ensure reproducibility and reusability.


“This context will be crucial to establishing convenient, secure, efficient, and robust federated infrastructures.”

As the Spanish e-Science Network coordinator, you observe developments at national and European levels. What role can national e-infrastructures play in building European data spaces, and how do they contribute to the broader research ecosystem?

European data spaces are mainly driven by industry, except those more closely linked to research. Significant work must still be done to establishconsolidated framework at both the national and international levels. A paradigmatic example is the European Health Data Space (EHDS), which has outlined a legal framework with strong involvement from Member States and a clear focus on international cooperation. This is the direction forward, ensuring that countries develop interoperable structures capable of contributing to data spaces while defining common standards and services to facilitate cross-border collaboration. National infrastructures should not aim to replace existing international alliances and infrastructures, but rather to complement them, providing alignment, supporting multidisciplinary science, and enabling the inclusion of less-represented research communities. 


What would an ideal collaboration look like between research communities, infrastructure providers, and policymakers to ensure that European research data is truly AI-ready?

Provenance and quality evaluation are essential. Seamless integration between Research Infrastructures, e-Infrastructures, and scholarly resources will enhance scientific reproducibility and enable complex scenario analysis that can better support policymaking. At the same time, policymakers must balance preserving individual privacy and serving the public good.

Developing AI models within Secure Processing Environments (SPEs) as required under the EHDS should not compromise their practical applicability. According to the EHDS, only anonymised data may be extracted from SPEs, yet AI models can still incorporate sensitive information. Despite this, applying such models could bring enormous societal benefits, which must be considered.


“This is the direction forward, ensuring that countries develop interoperable structures capable of contributing to data spaces while defining common standards and services to facilitate cross-border collaboration.”

Is there a field outside science and technology that inspires your thinking about data, collaboration, or innovation?

I’m an enthusiast of photography, which touches on all three of these principles. First, the output of photography is data, often accompanied by metadata and standard formats, that allows other photographers to reproduce the conditions, techniques, and results. Second, there are many platforms for sharing photographic data. My favourite is GuruShots, where you can participate in challenges and compete with other users. It’s a great way to learn from others, their ideas, composition, and execution. Finally, innovation is essential to perfect lighting, exposure, and subject.

Ignacio Blanquer


is a professor of computer systems at the Polytechnic University of Valencia and a member of the EOSC Association Board since 2020. He leads the Grid and High-Performance Computing group at the Institute for Molecular Imaging (I3M) and coordinates Spain’s national e-Science network. As an expert to the Spanish Ministry of Science and a delegate to e-IRG, he plays a crucial role in shaping digital research infrastructures. He has led or contributed to numerous EU cloud computing, medical imaging, and open science projects.


More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.