Towards a FAIR database for molecular simulations

Credit: IRB Barcelona

The need to implement FAIR principles in biomolecular simulations. The European Project MDDB (Molecular Dynamics Data Bank), coordinated by IRB Barcelona, which aims to build an open and standardized database to store dynamic molecular simulations

Computational simulations have become a key tool for studying the behaviour of biomolecules over time. Thanks to supercomputers, molecular dynamics (MD) makes it possible to observe these processes with high precision, providing valuable insights for both basic research and the design of biomolecules—from enzymes to drugs.

Unlike structural biology or genomics—where data storage and sharing follow established common standards—molecular simulation data remain fragmented and are often left forgotten on personal computers. This hinders the reproducibility of calculations and prevents their future reuse. The result is a formidable obstacle to integrating such data into the workflows of structural biology and biophysics, while also slowing the development of artificial intelligence (AI) methods, whose training relies heavily on access to vast amounts of dynamic data.

In the article published in the journal Nature Methods, more than one hundred researchers raise awareness of this situation and advocate for a change of model: applying the FAIR principles—which guarantee that the data are findable, accessible, interoperable, and reusable—to simulation results. The article has been signed by renowned international experts, including two Nobel laureates and leading figures from the best research centres in the world. The proposed goal is to build an open and sustainable ecosystem that amplifies the impact of these data and avoids unnecessary duplication.

“Reusing instead of repeating”

“For years, the community assumed that rerunning a simulation was easier and cheaper than archiving it. But that’s no longer the case," says Dr. Modesto Orozco, coordinator of the European project MDDB, head of the Molecular Modelling and Bioinformatics lab at IRB Barcelona, Full Professor at the University of Barcelona and founder of the biotech Nostrum Biodiscovery.

“The knowledge we can gain from reusing data is immense: it will allow us to identify new targets, train AI algorithms, and design new experiments,” adds Dr. Hospital. Both researchers lead the European Project MDDB—funded by the Horizon Europe Programme run by the European Commission— which specifically aims to establish a centralized and accessible database for molecular simulations.

Lessons from other fields

The proposal draws inspiration from the success of other fields that have embraced open science. The Protein Data Bank, which has collected three-dimensional structures of biomacromolecules since the 1970s, has been instrumental—not only in revealing the function of proteins and nucleic acids, enabling the 'omics' revolution, and providing a holistic view of the cell, but also in the development of drugs, vaccines, and new therapies. The data stored there were key to training AlphaFold2, which was recognized with the 2024 Nobel Prize in Chemistry. The authors argue that complementing these structural data with dynamic information will open a new field whose developmental potential is difficult to grasp.

According to the authors of the article, the time has come for the molecular simulation community to adopt practices similar to those of the structural and 'omics' communities—not only preserving data, but also standardizing file formats, metadata, and quality criteria.. The text outlines how a federated infrastructure—with distributed nodes and shared access tools—could make this planet-scale archive feasible.

Beyond storage

The approach put forward in the article published in Nature Methods goes beyond merely storing data. It advocates for an integrated model—from the precise documentation of simulations (including conditions, software, parameters, etc.) to their automated analysis, validation, and reuse through machine learning techniques. “The value of these data doesn’t end with the publication of a paper or their presentation at a conference. Often, that’s just the beginning,” concludes Dr. Orozco. “We must treat data as a shared resource for science”.

This article has been drawn up in the framework of the European Project MDDB (Molecular Dynamics Data Bank), coordinated by IRB Barcelona, which aims to build an open and standardized database to store dynamic molecular simulations. The consortium, funded by the Horizon Europe Programme (grant 101094651), brings together leading research centres in bioinformatics, simulation and data analysis to move towards more open, reproducible and collaborative science. More information:

麻豆传媒