CD-CODE 2.0: Condensate knowledgebase for biomedical science

Upgrade of CD-CODE to connect research on condensates for treatments and therapies

© Partly AI generated / Ksenia Kuznetsova / MPI-CBG

Biomolecular condensates, membrane-less organelles within a cell, organize many biological processes by selectively concentrating biomolecules, mainly proteins and nucleic acids. Biomolecular condensate research has provided a new perspective on biomolecular organization and therapeutic discovery.

The field of biomolecular condensates has seen a sharp increase in the number of related research publications. To help organize this vast information, the CD-CODE database and encyclopedia was created in 2023 by the research group of Agnes Toth-Petroczy at the Max-Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) and at the Center for Systems Biology Dresden (CSBD). The CrowDsourcing COndensate Database and Encyclopedia (CD-CODE.org) is a platform collecting knowledge on the biomolecular condensates based on experimental data, enhanced by a crowd-sourcing functionality to engage condensate experts.

Since first being published in 2023, CD-CODE has been very valuable in advancing research and has even been used to develop new tools predicting proteins that form condensates. Additionally, condensate protein components are now cross-referenced in the UniProt Protein Database and information in CD-CODE is linked to other databases to enhance its utility in research.

CD-CODE is an example of interdisciplinary collaborative teamwork between computational and experimental biologists and software engineers. With input from researchers of several groups at the MPI-CBG and the CSBD, such as from the group of Anthony Hyman and the Scientific Computing Facility at MPI-CBG, the researchers in the Toth-Petroczy group developed CD-CODE 2.0 together with Diana Mitrea from Dewpoint Therapeutics. This enhanced version expands the utility of CD-CODE 1.0 for biomedical research. The two lead authors, Ksenia Kuznetsova and Maxim Scheremetjew, explain, “New features such as data on nucleic acid condensate components, infectious condensates, condensate-regulating drugs, and disease-linked condensate abnormalities expand CD-CODE’s utility for biomedical research and hypothesis generation. We also addressed the usability of CD-CODE 2.0 with improved search capabilities, convenient programmatic access, and relationship-based architecture to enable interconnectivity across major biomedical databases.”

Agnes Toth-Petroczy concludes, “CD-CODE 2.0 will make it easier to use computational tools and data analysis to study biomolecular condensates. The upgrade will make CD-CODE a more useful tool for many different fields of science, such as biomedical research, and will help connect research on condensates to new treatments and therapies.”