Your paper, titled Reproducible Experiments with the Learned Metric Index Framework, has been featured in a peer-reviewed journal Information Systems. Could you share the inspiration and motivation that led you and your research team to explore this topic?
The editor of Information Systems has recommended that we build upon one of our earlier published papers, titled Learned Metric Index - Proposition of learned indexing for unstructured data. This paper focused on the application of AI-driven data organization in complex data search scenarios, particularly in image and video databases. The suggestion was to further dissect and present the results from this work in a step-by-step manner, essentially creating a tutorial or a shared tool that can be utilized by fellow IT researchers in their projects.
What did you focus on when we talk about searching with AI?
In our research - focused on AI-driven search - our primary emphasis was on images. However, we didn't delve into the pixel-level details but rather concentrated on descriptors. These descriptors are specific pieces of information that can be likened, somehow, to an individual's DNA or unique identifier. They contain crucial information about the invididual data objects, necessitating a more meticulous search approach than is needed in structured data. When dealing with unstructured data sets – data that lacks a meaningful inherent order – searching becomes significantly more challenging than with structured data that can be sorted. Naturally, searching structured data is simpler. In our paper Reproducible Experiments with the Learned Metric Index Framework, we meticulously dissected our approach in line with the principles of Open Science.
What was particularly challenging for you in the preparation of the text?
In our prior scientific paper, we aimed to create a proof of concept, essentially an initial solution. However, the focus here was distinct. Our task was to describe the entire process and to make sure that the process would be easily adopted by other researchers. This process involved a comprehensive overview of the code discussed in our initial paper to enhance its clarity and usability.
Furthermore, we sought inspiration from existing texts that addressed a similar challenge and provided comprehensive documentation. Surprisingly, we found a limited number of such texts, and even when we did, they didn't align perfectly with our requirements. Consequently, we had to navigate our way through the code described by other authors in the texts we managed to find. It's important to note that our primary research theme still revolved around the intricate process of searching unstructured image databases.
What is the importance of the text that was published?
This text holds significant value for fellow researchers due to its immediate utility. It offers a level of specificity that distinguishes it from the limited pool of texts addressing a similar topic. Researchers can promptly integrate the knowledge presented in this text into their research endeavors.
As I often like to emphasize, there are two dimensions. First, researchers can effortlessly engage with the so-called reproducibility protocol, enabling them to access and utilize the data and artifacts we've created. They can essentially replicate the precise environment where our experiments were conducted, employing additional configuration files and code to replicate our series of experiments. Our code also serves as a valuable tool for the analysis, visualization, and validation of the results featured in the paper.
The second dimension involves the broader potential for researchers to draw inspiration from the provided artifacts. It can be valuable when crafting their own protocols, experiment designs, configuration files, or research environments. In essence, this text acts as a useful reference that empowers researchers to build upon and adapt these resources to suit their unique research objectives.
And what did it bring you personally?
Personally, this endeavor has had a profound impact. It has instilled a sense of organization and clarity in our work, from formulating hypotheses to conducting experiments and finalizing papers.
My primary personal goal was to ensure that we could revisit our work easily in the future, with all components readily available – data, code, experiments, analyses, and visualizations. Embracing open and transparent practices has been motivating.
I derive great satisfaction from advancing our team's research and the opportunity to assist others in their endeavors. This sense of contribution and knowledge sharing is particularly rewarding.
What do you think about scientists resisting sharing their research data?
I understand that some scientists may have qualms about sharing their research data. It's essential to recognize that when someone uses our data or information, it's more than just that – they are expected to acknowledge and cite our work. However, not everyone shares the same attitude as me and my close colleagues. There remains a degree of apprehension within the scientific community about the prospect of opening one's research.
In my personal view, I firmly believe that embracing the principles of open science is advantageous for all researchers. The open sharing of data and findings not only promotes transparency but also fosters collaboration and innovation within the scientific community. It allows for constructive feedback, which can be invaluable in improving our work and driving scientific progress. For scientists, our collective goal is to advance our understanding and knowledge, plus I believe that open science is a valuable path toward achieving this goal.
Your enthusiasm for science is contagious. You're currently finishing your Ph.D. What's the focus of your dissertation?
I focus on the development of efficient similarity-based search methods in complex datasets, which encompass a diverse range of objects, including text, images, videos, and complex biological data like protein structures. These datasets can be massive, often containing hundreds of millions to billions of individual objects.
The core concept behind my research is to reduce the number of direct comparisons and identify underlying patterns within the data. By harnessing the power of machine learning, we can significantly improve the efficiency of finding similar objects. This approach allows for highly accurate and rapid searches, making the process of extracting meaningful information from extensive datasets more efficient and effective.
Artificial Intelligence, or AI, is a term often used nowadays. But you became interested in AI in high school. Where did your interest come from?
It's a lighthearted story with no dramatic twists. I was drawn to the world of computers and spent a lot of time on the internet, engaging in discussions with fellow IT enthusiasts about the latest and "cool" IT topics. It was during these online interactions that I first delved into the realm of machine learning. Initially, I had a peripheral interest in the subject, but everything changed when I stumbled upon my first online book on neural networks in 2015.
The book was titled Neural Networks and Deep Learning by Michael Nielsen, and it immediately sparked my enthusiasm. I realized that machine learning was an area within IT that captivated my interest and held immense potential for exploration.
During your studies, you also gained experience abroad. You had an internship in Aalborg, Denmark, and Kiel, Germany. What did you do in particular? And how are Czech scientists perceived abroad?
During my academic journey at Aalborg University and Kiel University, I immersed myself in the exploration of similarity search, learning from esteemed experts in the field. Particularly the internship in Germany, guided by Professor Kröger, was instrumental in refining a specific research idea, which we continued to discuss during regular online meetings every two weeks after the stay as well.
I found that the perception of Czech students as being in any regard inferior is largely a stereotype that doesn't hold true in international academic settings. In fact, my active participation in student communities created an impression of eagerness to learn and contribute, fostering a supportive and inclusive atmosphere.
Finally, let's return to the topic of open science. How would you motivate young researchers that Open Science has its place in contemporary science?
Open science ideas make sense for the scientific community, regardless of the field. They are means to improve the whole research process. I always try to work carefully, but mistakes happen. If the Open Science processes are set up well, you can find errors quickly and efficiently. So, Open Science really does help researchers.
And the last sentence at the end?
Surround yourself with people with common values. Everything can then be easier and smoother.
Thank you for the interview for the EOSC web initiative in the Czech Republic. We wish you every success in the future and endless scientific inspiration. And, of course, many enthusiastic colleagues for Open Science.
RNDr. Terézia Slanináková
Terézia Slanináková is a researcher in the Intelligent Systems for Complex Data research group at the Faculty of Informatics of Masaryk University, where she is writing her Ph.D. thesis under the supervision of Vlastislav Dohnal and Matej Antol. She is also currently working on developing the EnviLab platform for geospatial data analysis in the Czech Republic in the Data Science research group led by Tom Rebok from the Institute of Computer Science at Masaryk University. Terézia actively applies Open Science principles in her research and motivates other young researchers to bring Open Science into their research projects.