The Reactome Knowledgebase: Powering Biological Research with Knowledge Graphs

A look into how the best scientists and system biologists organize biological information.

Sep 23, 2024

This week we are featuring an article written in collaboration with Nancy T Li, PhD (Reactome Outreach Lead). Nancy leads community engagement, training, and outreach for the Reactome Knowledgebase.

Introduction

Biological states and transitions between these states are controlled and regulated processes. There are numerous chemicals, molecules, complexes that work to coordinate these processes, that together, give rise to form and function. Our task, as scientists and drug hunters, is to understand how these processes work so that we can exploit them and find cures for diseases. To understand causality, we need reliable and up-to-date information from sources that we can trust. For biological pathways and systems biology, the de facto choice for BioBox and our biopharma partners has been the Reactome knowledgebase.

Reactome Knowledgebase

Reactome is a publicly funded, open access, open source database, created in 2002 by Lincoln Stein at CSHL, Ewan Birney at EBI-EMBL, and Suzannah Lewis at Lawrence Berkeley National Laboratory. Reactome is now the leading open resource for human-curated pathway knowledge, and is a key resource in BioBox, used for making reliable predictions.

The Bedrock of Reliability: Reactome’s Curation Practices

In the dynamic realm of bioinformatics and biomedical research, reliable, high-quality data is the lifeblood of scientific advancement. Researchers depend on databases like Reactome to provide accurate and comprehensive information about biological pathways. Central to Reactome's reliability is its meticulous curation process.

Reactome’s curation process is rigorous and multi-faceted, ensuring that the data it provides is both accurate and relevant:

Literature Review and Data Extraction: Curators at Reactome are PhD-level biological experts, who conduct an extensive review of scientific literature to extract knowledge about biological pathways. This step demands a deep understanding of molecular biology and the ability to discern significant findings that can be integrated into pathway models.
Data Integration and Standardization: Extracted data is integrated into Reactome according to Reactome’s object-based data model, and standardized using controlled vocabularies and ontologies such as the Gene Ontology (GO). This standardization ensures consistency, making it easier for researchers to query and analyze the information.
Expert Review and Validation: Reactome curators work with domain experts to review and validate the curated data, similar to how academic journals conduct peer-review before publication. These experts ensure that the pathways accurately reflect current scientific knowledge and experimental evidence.
Community Involvement: Reactome encourages the scientific community to contribute data, suggest corrections, and provide feedback. This collaborative approach helps keep the database current with the latest scientific discoveries. Here is a web page, showing pathways that are ready for external review[1].

Transforming Data with Knowledge Graphs

Knowledge graphs have become a transformative tool in the bioinformatics landscape. By structuring data as interconnected entities and relationships, knowledge graphs provide a more intuitive and powerful way to represent and analyze biological information. Reactome’s adoption of a knowledge graph perspective enhances its curation practices and opens up new avenues for research.

By organizing data within Neo4j, Reactome not only enhances data reliability but also unlocks new opportunities for researchers to explore and interpret complex biological systems. See here for a publication by Fabregat et al, 2018, discussing the Reactome graph database [2].

Statistics for Reactome graph database contents as of version 89.

Data Interconnectivity:

Entity Relationships: In a knowledge graph, each piece of data is an entity connected by relationships. This interconnected structure allows researchers to see not just isolated data points but the rich web of interactions and dependencies between them. For instance, a knowledge graph can reveal how a particular gene is involved in multiple pathways and how these pathways inter-relate.
Contextual Understanding: Knowledge graphs provide context by linking related entities. Researchers can understand how a particular biological pathway fits into larger cellular processes and how alterations in one pathway might affect others, leading to more holistic, systems-level insights.
Example of the Reactome Graph Database (v89), querying for a dissociation reaction, relating pathway, input complex, output proteins, and the UniProt reference entities.

Enhanced Data Analysis:

Advanced Querying: Knowledge graphs enable sophisticated querying capabilities. Researchers can formulate complex queries that span multiple entities and relationships, uncovering insights that would be difficult to achieve with traditional database queries.
Predictive Analytics: By leveraging the interconnected nature of knowledge graphs, AI and machine learning algorithms can predict new relationships and interactions within the data. This predictive capability can lead to the discovery of novel pathways and biological mechanisms.

To try this out, Reactome has provided documentation here that describes how to install and try out Reactome’s Neo4j graph database [3].

AI and Machine Learning:

AI-Augmented Data Curation: AI and machine learning algorithms can automate aspects of the curation process. For example, natural language processing (NLP) can extract relevant information from scientific literature and integrate it into the knowledge graph, speeding up the data curation process. The Reactome team is actively working toward finding reliable methods to augment curation processes using AI. Stay tuned for more information in the future!
Pattern Recognition: Machine learning can identify patterns and correlations within the knowledge graph that might not be immediately apparent. Researchers can leverage Reactome’s Neo4j graph database to find new insights, driving new hypotheses and research directions.

Meeting Researchers’ Needs: Reliability, Accessibility, and Discovery

For researchers, the reliability and accessibility of data are critical. By employing knowledge graphs, Reactome enhances these aspects and offers additional benefits:

Comprehensive and Accurate Data: The rigorous curation practices ensure that data within Reactome is accurate and comprehensive. The knowledge graph structure further ensures that data is contextually relevant and interconnected, providing a richer understanding of biological pathways.
User-Friendly Interface: The knowledge graph framework supports advanced visualization and interaction tools. Researchers can easily navigate through interconnected data, explore relationships, and visualize complex pathways in an intuitive manner.
Up-to-Date Information: The dynamic nature of knowledge graphs allows for continuous updates and refinements.
Facilitating Discovery: Knowledge graphs not only provide current data but also facilitate the discovery of new knowledge. Researchers can explore uncharted relationships and generate new insights, driving innovation and scientific progress.

Using Reactome Data in BioBox

Loading Reactome data into your BioBox knowledge graph can be done through importing the Reactome data package in the external data package listing.

This data package includes concept and relationship definitions that will incorporate into your custom ontology. Objects loaded through this data package preserves the data and mapping from the Reactome knowledge base and integrates with your private knowledge graph for increased comprehension.

Conclusion

At BioBox, we are thrilled to be partnered with Reactome. Reactome’s commitment to rigorous curation practices, coupled with the power of knowledge graphs, makes it an indispensable resource for researchers. By organizing data in an interconnected, contextual framework, Reactome enhances data reliability and opens up new opportunities for exploration and discovery. As the integration of data analysis and AI technologies continues to advance, Reactome remains at the forefront, providing researchers with the tools they need to push the boundaries of scientific knowledge.

References

[1] Link to Reactome Collaborator Zone webpage: https://reactome.org/community/collaboration

[2] Fabregat A, Korninger F, Viteri G, Sidiropoulos K, Marin-Garcia P, Ping P, Wu G, Stein L, D'Eustachio P, Hermjakob H. Reactome graph database: Efficient access to complex pathway data. PLoS Comput Biol.Jan 29. PubMed.

[2] Link to Reactome Developer Zone documentation: https://reactome.org/dev/graph-database

About BioBox

BioBox is a knowledge infrastructure for modern biopharma research teams to accelerate drug discovery and make better decisions in preclinical studies. Biopharma research teams from startups to top 20 pharms use BioBox to transform multi-omic data into knowledge graphs and use them to drive decision making in target prioritization, indication selection, and biomarker discovery.

To learn more visit biobox.io or reach out to get in touch with a team member.

A guest post by

Nancy T Li

Outreach leader for the Reactome Knowledgebase. Previously, a PhD researcher in chemical engineering & biomedical engineering creating novel tissue models for studying pancreatic cancer

Discussion about this post

Ready for more?