Differential privacy allows us to assess data quality without ever accessing the data itself.

Interview with Radovan Tomášik, winner of the Poster Session competition at Long Live Research Data – EOSC CZ National Conference 2025.

12 Dec 2025 Lucie Skřičková

No description

The winning poster at the EOSC CZ 2025 National Conference was created by a trio of authors from BBMRI-ERIC, the Masaryk Memorial Cancer Institute, and Masaryk UniversityRadovan Tomášik, Ivan Mahút, and Simona Menšíková. We spoke with Radovan Tomášik about how the idea for the award-winning project came about, why data quality is always a matter of context, how *differential privacy works, and why research data should have a “long life.” 

You work in the field of medical informatics and simultaneously pursue a PhD focused on data quality in a federated environment. What drew you to this field, and what do you enjoy most about working with data in medicine?

I actually came to medical informatics by coincidence during my bachelor’s studies when I started working as a developer in the newly formed IT team led by Zdenka Dudová for the biobank at the Masaryk Memorial Cancer Institute in Brno. At first, it was just a student programming job, but I soon realized that medical informatics isn’t “just IT.” I saw how well-designed software can significantly affect the quality of data on which research depends — and ultimately the quality of science itself.

Although I work with data, I’ve always been more fascinated by software architecture – how to design systems that are understandable, sustainable, and can adapt even to situations their original creators never imagined. And if you ever want to make a system truly complex, make it federated. But that’s precisely what makes them fascinating: they are technically demanding but solve very real problems.

What I enjoy most is the combination of intellectual challenge and practical impact. During my PhD at the Faculty of Informatics, Masaryk University, I’m surrounded by great people who constantly push me forward. I often feel like the least intelligent person in the room – and that’s the best kind of motivation. Instead of mindless “coding” in a corporate setting, I get to solve problems that have real meaning and tangible impact on research and healthcare. And that’s deeply fulfilling.


“I saw how well-designed software can significantly affect the quality of data on which research depends — and ultimately the quality of science itself.”

Your winning poster is titled Privacy-Preserving Data Quality Assessment for Federated Health Data Networks. How would you explain the main idea to someone unfamiliar with this topic?

History teaches us that centralizing data — and power — is tempting, but risky. Having everything in one place seems convenient, but it also creates a single point of failure — technical, organizational, and even social. In healthcare, this is particularly problematic due to privacy concerns.

A federated approach offers an alternative: data stay where they originate — for instance, in hospitals — which can then be shared or summarized for research. It’s a more realistic and safer model because it maintains both responsibility and control.

The key challenge is how to assess data quality without having access to all the data at once. And even more fundamentally, what does “data quality” mean? Quality isn’t absolute; it’s about fitness for purpose. What’s “good enough” for one study may be entirely unsuitable for another.

My research focuses on evaluating data quality without direct access to the data themselves. Hospitals don’t share the actual data but rather securely processed characteristics that allow us to assess their quality without compromising patient privacy.

It’s a bit like judging a book by its blurb — you don’t see the complete text, but you have enough information to decide whether it’s worth reading. Likewise, a researcher can determine whether a dataset is “good enough” for their purposes without ever having physical access to it.


You worked on the poster with Ivan Mahút and Simona Menšíková. How did your collaboration between Brno and Graz work in practice?

All three of us are part of the Czech node of BBMRI-ERIC, based at the Masaryk Memorial Cancer Institute in Brno. I’ve also been working at the BBMRI-ERIC headquarters in Graz, so our team is naturally divided between the two cities.

In Brno, we work under the auspices of the Association. Prof. Roman Hrstka brings the perspective of a biomedical researcher and practical experience with real-world hospital data. From Graz, we have strategic insight and informatics leadership from Assoc. Prof. Petr Holub, CIO of BBMRI-ERIC and my PhD supervisor.

This setup works beautifully because it connects two worlds that often operate separately – the day-to-day reality of working with data and the strategic framework of research infrastructures. Without this type of collaboration, our research would either lose touch with real practice or lack broader relevance.


“It’s a bit like judging a book by its blurb — you don’t see the complete text, but you have enough information to decide whether it’s worth reading. Likewise, a researcher can determine whether a dataset is “good enough” for their purposes without ever having physical access to it.”

You work with highly sensitive healthcare data. What are the biggest challenges in assessing their quality, and what role does the principle of differential privacy play here?

The biggest challenge isn’t technical but practical. Every hospital uses slightly different formats, terms, and data structures, and even the term “sample” can have three different meanings. When we can’t agree on terminology, it’s tough to compare data quality.

In a decentralized environment, we also can’t simply “look” at all the data to judge their quality. That’s why we use a different model: data stays local, and only anonymized characteristics are shared, providing information about data quality but not revealing anything about patients.

And this is where differential privacy plays a crucial role. It’s a technique that adds controlled “noise” to shared values. The data remain useful for quality assessment, but can’t be exploited to identify individuals.

Put simply, differential privacy allows us to ask questions about data quality without ever seeing the actual data. It protects privacy even in extreme scenarios, not just under “normal” conditions.


Your approach enables data quality assessment across institutions without sharing the actual data. What potential do you see for this method within national or European infrastructures, such as BBMRI-ERIC or EOSC?

The potential is enormous. In infrastructures working with sensitive health data, such as BBMRI-ERIC or EOSC, centralization isn’t feasible. A federated model is a natural choice, but it brings its own set of challenges. Our method helps address those.

It enables secure, decentralized, privacy-respecting assessment of data quality across institutions. Instead of sharing the data themselves, only their protected characteristics are exchanged.

In distributed systems like BBMRI-ERIC, this can significantly enhance both trust and usability of the entire ecosystem. It gives researchers confidence that they are working with high-quality data, even if they never directly see it.


“Put simply, differential privacy allows us to ask questions about data quality without ever seeing the actual data. It protects privacy even in extreme scenarios, not just under “normal” conditions.”

Where do you see your research heading in the coming years?

I’m more of a practitioner than a theorist, so my goal is to turn research results into tools that work in real-world conditions. The first pilot deployment is already running within the Federated Search Platform of BBMRI-ERIC, and I hope to extend our approach to other data types, such as digital pathology or sequencing data. This would enable us to assess the quality of a broader range of biomedical information, bringing the system even closer to everyday research and clinical practice. 


The EOSC CZ 2025 National Conference carried the subtitle Long Live Research Data. What does this slogan mean to you personally?

To me, it’s a wish for research data not to be a one-off product that gets forgotten after publication, but to stay alive, to be reused years or even decades later in new projects and new contexts. And that’s precisely what open science enables. Transparency, sharing, and common standards give data the chance to live on, to “come back to life” in new analyses, new questions, and the hands of new researchers. Perhaps it’s a bit idealistic, but I believe that this kind of openness and faith in the value of sharing is what drives science forward.


*Differential privacy is a modern data protection technique that allows the analysis of aggregate information without the possibility of identifying individuals. It works by adding controlled statistical “noise” that safeguards privacy while preserving the scientific value of the data.

“I’m more of a practitioner than a theorist, so my goal is to turn research results into tools that work in real-world conditions.”

No description

Ing. Radovan Tomášik


is a data engineer at the international research infrastructure BBMRI-ERIC and the Masaryk Memorial Cancer Institute in Brno. He is also pursuing a PhD at the Faculty of Informatics, Masaryk University, with a focus on data quality in federated environments and methods for the secure sharing of medical information. In his work, he connects informatics, medicine, and data science — developing tools that enhance the credibility and usability of data in research. He is deeply interested in privacy protection and open science principles, combining technical precision with a practical understanding of the real-world needs of biomedical research.


More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.