How is an environment for sharing and long-term preservation of research data built?

An interview with Daniel Mikšík about the role of the National Data Infrastructure, the importance of methodologists and repository managers, and the potential benefits of recognising datasets as an independent research output.

13 Feb 2026 Lucie Skřičková

Daniel Mikšík’s professional path is not, at first glance, a typical one. He studied English and the Czech language and literature, yet for many years, he has worked at the intersection of technology, research, and scientific support services. This combination of a humanities background and hands-on IT practice enables him to view research data primarily as a tool that serves people.

Your career path is somewhat unusual – from English and Czech studies to leading an IT centre. How has your humanities education shaped the way you think about technology in science and research today?

That is a difficult question, because I cannot really disentangle from my life experience so far, the specific influence of a humanities education. Still, I do follow the work of colleagues from my home Faculty of Arts at Masaryk University, for instance, on the Digital Humanities website. Recently, an article there caught my attention – one dealing with the very concept of research data. Interpreting and refining the basic terms we use to describe a continuous reality seems to me a lasting and vital contribution of the humanities when done well. The key message of the text, and of the literature it referred to, was even more fundamental. The central role in scientific practice still belongs to concrete individuals – people with deep knowledge and non-trivial skills. From these skills arises their ability to decide what information they need to collect and how it must be processed to be interpreted credibly and meaningfully. These individual, final, fallible, honest, and passionate people transform “masses of data” into knowledge. And it is precisely these people whom the services and tools we develop should support in their work.

We often hear that technologies should primarily serve people. How does this principle translate into research data management and national infrastructure – and where do you see the concrete benefit for researchers themselves?

Whenever I hear that technologies should “primarily serve people”, I become alert. This formulation is often used in contexts where, in reality, they serve mainly very specific groups of people. That is why I would rather identify “concrete benefit” retrospectively – from below, through real use. Still, I can describe what we are already doing today in building the national data infrastructure so that researchers can genuinely experience it in their everyday work as something that helps them – and not the other way around. We build on well-established, high-quality technologies for storing and providing access to research data (Invenio, DSpace, Data Stewardship Wizard, and others), and we connect them with existing infrastructure services, especially storage and computing environments within e-INFRA CZ. Experienced developers ensure deployment, communication with the research community by methodologists, and user experience is systematically addressed by a specialised UX team. We continuously collect feedback and incorporate it into further development.

At the core stands a general principle of designing and delivering IT services: we identified the need for high-quality research data management, along with the known approaches to fulfilling it – repositories, FAIR principles, and support for machine processing of data and metadata. On this basis, we are now creating and making available concrete tools: the national repository platform, data transfer from instruments, integration with computing environments, and further services.

Then, in what one might call the “most real reality”, a general tool encounters the specific need of a particular researcher or team in their actual context. Through communication with them, we learn what works, what is missing, and where tools need to be further developed to make sense in practice. These findings are then incorporated into the following stages of development.

“Whenever I hear that technologies should “primarily serve people”, I become alert. This formulation is often used in contexts where, in reality, they serve mainly very specific groups of people.”

Practice shows that data alone is not enough. How important is the role of methodologists, repository managers, and local institutional support for making data truly usable?

In the Rules for Establishing Repositories within the NRP, a repository is defined as “the technical, personnel, and process-based provision of a long-term storage facility for the deposit and publication of citable digital objects.” Like most definitions, it is somewhat stiff, but its opening captures well that a repository is not only technology.

Only through long-term, qualified care – ensuring that a repository contains relevant, well-described data, transparently accessible and reusable – does a technical solution become a functioning whole. Over time, within a particular scientific community, it can become a trustworthy source of data. The members of the expert community who form the repository team (managers, data curators, and technical support) are an inseparable part of this whole.

Since this year, datasets can be reported as an independent research output of type T. What does this step mean for the position of research data in science?

I think it is still difficult to assess. I see it primarily as a reflection of an ongoing change in scientific practice: data should be cared for, and therefore, the level of that care will become part of the evaluation of research institutions.

How evaluators will actually work with reported digital data collections – and how datasets and their parameters will truly enter institutional assessment – still needs to take shape. It will also depend on what signals this process sends back to institutions and to the Czech research community, and how these signals will ultimately influence decisions made by institutions, individual researchers, and research teams.

Could this change the way researchers think about data, already during research rather than only at the final stage of reporting outputs?

How researchers approach their data is influenced by many factors. Reporting digital data collections will be one of them, but I would personally not overestimate its role. In my view, much greater importance lies in the customs and standards of individual scientific communities: what data makes sense to store and share, how it needs to be described, and also in institutional support in the form of strategies, internal regulations, and the availability of support personnel. Closely connected to this is education – the extent to which data management becomes part of university curricula or lifelong learning. All of these areas are systematically supported and developed by the national implementation of the EOSC initiative.

The infrastructure also includes the National Metadata Directory (NMA). Why is it important even for researchers who do not usually take an interest in infrastructure?

If the ambition succeeds – to gather metadata about datasets affiliated with Czech institutions almost in real time – then a service with unique content will emerge, along with other essential properties, such as a public API that allows the collected metadata to be extracted and further processed at minimal cost.

Thanks to a unified metadata standard, it will be possible to search for datasets across thematic repositories, identify overlaps, and further interlink metadata. In the first phase, this service will be primarily relevant to institutions (overviews and analyses). With the growing number of repositories built on the National Repository Platform (NRP), its benefits will, I assume, become more evident to researchers as well.

“In my view, much greater importance lies in the customs and standards of individual scientific communities: what data makes sense to store and share, how it needs to be described, and also in institutional support in the form of strategies, internal regulations, and the availability of support personnel.”

Mgr. Daniel Mikšík

studied English and Czech language and literature at Masaryk University‘s Faculty of Arts. From 2004, he worked at the faculty’s Centre for Information Technologies, contributing, among other things, to the use of parallel language corpora, the development of e-learning, and the building of the technical and data infrastructure for Digitalia MUNI ARTS. Between 2010 and 2024, he served as Head of the Faculty of Arts' IT Centre. He currently works as an IT architect at the Institute of Computer Science of Masaryk University, within the CERIT-SC research centre, which focuses on data-intensive research and support for scientific communities. Within the National Repository Platform (NRP) project, he leads the activity focused on pilot repositories (KA3), whose role is to test and validate the solutions being developed for the management, sharing, and long-term preservation of research data in accordance with FAIR principles.

All articles

How is an environment for sharing and long-term preservation of research data built?

Your career path is somewhat unusual – from English and Czech studies to leading an IT centre. How has your humanities education shaped the way you think about technology in science and research today?

We often hear that technologies should primarily serve people. How does this principle translate into research data management and national infrastructure – and where do you see the concrete benefit for researchers themselves?

Practice shows that data alone is not enough. How important is the role of methodologists, repository managers, and local institutional support for making data truly usable?

Since this year, datasets can be reported as an independent research output of type T. What does this step mean for the position of research data in science?

Could this change the way researchers think about data, already during research rather than only at the final stage of reporting outputs?

The infrastructure also includes the National Metadata Directory (NMA). Why is it important even for researchers who do not usually take an interest in infrastructure?

Mgr. Daniel Mikšík

More articles

“Without public trust, it won’t work,” Zdenka Dudová talks about the future of health data

Research Data Sets as a New Type of Research Output

Differential privacy allows us to assess data quality without ever accessing the data itself.

Long Live Research Data: The National EOSC CZ 2025 Conference Brought the Latest Topics in Research Data Management and Sharing to Ostrava