What are the other worries? Keeping my data organized sounds like something I can put in a drawer and know where I have it, but I guess it's more complicated with data, right?
The drawer analogy actually fits quite well, we just need to add binders to the drawer. The problem is that the drawers have to be described and the right things have to go in them. A scientist is not a librarian or an archivist who knows exactly where things are, because their main job is to be creative. That's my case too. Putting things in order is not my strong point, but I am looking for new solutions. I'm mostly into automation.
Can you describe your automation solution in more detail?
I mean automation of data collection and the collection of information about the data itself. I don't want to say that everything works perfectly, that's a long way off, but most people have experienced having to fill in their name, affiliation, email address and so on over and over again, even in EOSC Association documents. We fill in the same thing over and over again, and the system should know who we are by now. What I'm working on is to improve this so that, for example, the system recognises that it's me using my phone and fills in my affiliation straight away, instead of offering me a whole list of institutions in the Czech Republic. Using lab journals makes this even easier, because I can have ready-made templates and work structures with all the information about a given experiment. Often an experiment differs from the previous one in only two or three parameters. Because I've already written a protocol, it's not a problem to create a copy in an electronic journal where I just change the little things. Automation systems recognize different experiments and each has its own identifier. This brings us back to the archivists, but the trick is that the scientist doesn't need to know that these processes are running in the background, and doesn't need to assign identifiers to his experiments himself.
How do these data management solutions reach other scientists and researchers?
Firstly, I lead one EOSC CZ working group. I'm a biochemist, but I lead a working group on materials science and technology. This role came about because at the Heyrovský Institute the focus on materials and technology is very strong, I am involved in data management plans and overall Open Science and scientific data management. Face-to-face meetings, that's one way. This information then spreads mainly virally.
Can you elaborate?
Two years ago, at the Heyrovský Institute, we introduced mandatory data management plans for every project that we develop. This is proving to help us raise awareness of the need for good data management. We have created our model for data management plans in contrast to the way the EU uses them by focusing exclusively on data. The most interesting question we have in these plans is about data reuse, where the answer is wrong 99 percent of the time. People do not realise that even data from their decade-old research can still be relevant, that no research is greenfield. But even that data should be available if we want to do reproducible science. This question in the data management plans highlights and educates on this.
Do you have Data Stewards on your team?
At the Heyrovský Institute, I founded the "Heyrovský Open Science Team", which started to form two years ago with Eva Pluharova and Stefan Swift. The team is gradually growing and you can say that it is a Data Stewardship team, because we intensively discuss all issues related to data management. We also have colleagues who are technically oriented, such as Michal Tarana and Jakub Chalupský, who are excellent developers and experts in the IT environment. Importantly, we have a comprehensive overview of the whole system, including policy, scientific and developer perspectives.