Playback speed
×
Share post
Share post at current time
0:00
/
0:00
Transcript

Knowledge Infrastructure for Modern Biopharma

How teams are using the BioBox Data Intelligence Platform to solve complex drug discovery data challenges

Understanding the biology of a disease is the backbone of a successful drug program. This is much easier said than done when working with thousands of multi-modal data points and cross functional teams.

Every day new AI and data management tools are hitting the market, promising to revolutionize drug discovery. The majority claim to accelerate insights and facilitate data driven decisions, but it’s often difficult to understand exactly how they are helping teams accomplish this.

In this article I will walk you through exactly how a drug discovery team used the BioBox platform to solve complex data challenges.

Within 8 weeks the data team was able to:

  • Build a proprietary evidence ranking and prioritization system that led to the identification of 10 novel targets.

  • Prioritize 2 new indications for existing assets.

  • Curate a custom representation of disease biology.

  • Leverage machine learning algorithms to identify serendipitous data connections.

  • Fact check their internal AI solutions that often fall victim to hallucinations.

Challenge: Creating custom representation of disease biology with harmonized public + private data

Building a custom data graph

The foundation of the platform is a data graph, think of this as a custom GPS for navigating disease biology. It enables teams to keep track of important biological relationships such as disease, genes, variants, pathways and their relationships.

Here is where we differentiate from a lot of knowledge graph providers, the foundation of the custom graph curated for each team is quantitive sequencing data such as Single Cell or Whole Genome Sequencing data. In addition to this, we provide a wide variety of curated, versioned and well annotated consortium data such as OpenTargets, Reactome and NIH clinical trials.

Nothing is worse than picking up a new software tool and having to put in countless hours importing data and setting up the new system. Our graph curators worked with the client to make the set up process as simple as possible.

  1. Our client described important biological relationships and questions they would like to answer.

  2. The client provided us with access to the data they would like to upload to the platform.

  3. We took care of the heavy lifting, tailoring their data graph schema and uploading their proprietary data. We ensured the teams were provided with adequate documentation, UI, and API support should they choose to do any data graph management on their own.

Within 48 hours the team had a custom data graph ready for exploration.

Problems solved

  • Creating and managing a data graph. We provided the tools and infrastructure to ensure that the graph could be easily updated and maintained.

  • Data wrangling and harmonization. We took care of the tedious data wrangling and harmonization to ensure the client’s data was seamlessly integrated.

  • Knowledge base versioning and maintenance. We maintain versioned data packages and ontologies, making third party knowledge base integrations effortless.

Challenge: Identifying Serendipitous Data Connections

Traversing the graph

Once the data graph was set up, the team used the graph explorer to traverse their data. They were able to build compelling narrative around targets, diseases, and cell types. Graph explorer sessions were saved and shared with colleagues for real time collaboration.

The team was able to execute graph algorithms like page rank to identify the relative importance of data points and all shortest paths to reveal serendipitous data connections.

Problems solved

  • Identifying critical data points that would have been overlooked had the data remained fragmented.

  • Building compelling narratives surrounding targets of interest.

Challenge: Building Evidence Ranking and Prioritization Systems

The BioBox platform does not autonomously predict targets or tell teams which indications they should pursue. It enables scientific teams to use their domain expertise to curate custom evidence ranking and prioritization systems by leveraging their fully connected data graph.

The team configured several graph models tailored to specific use cases:

  • Variant prioritization. This graph model provided a ranked list of variants identified across public GWAS studies and proprietary WGS studies that would put patients at risk for diseases of interest. Variants were ranked according to the team’s scientific criteria.

  • Indication prioritization. This graph model provided a ranked list of indications that would be suitable for an asset that they have in phase 1 clinical trials. Indications were ranked according to the team’s scientific criteria.

  • Pathway prioritization. This graph model provided a ranked list of pathways that were perturbed within a disease of interest. Pathways were ranked according to the team’s scientific criteria.

  • Target prioritization. This graph model provided a ranked list of genes for diseases of interest according to the scientific criteria that they value.

Problems solved

  • Team domain expertise was siloed from quantitative data. Through the curation of a graph model, teams were able to develop a framework of domain expertise for a variety of use cases.

  • Prior to the use of BioBox countless hours were spent updating/ rerunning reports and analyses. All graph model reports autoupdate upon the injection of new data.

  • Managing data from an active discovery pipeline. All reports are timestamped and versioned. This provided the team with a comprehensive understanding of how data points were changing in priority in as new data was injected.

Challenge: Complex Multi-hop Data Lookups

Intuitive Query Language

Knowledge graphs are notoriously difficult to query. No one wants to write in cypher or SPARQL.

The team used our intuitive query language to traverse their ontology and ask complex multi-hop questions without unnecessary table joins. Multi-hop questions pulled information from multiple data sources within the graph.

Questions such as such as “ Which genes participate in the RNA polymerase II Transcription Pathway and are upregulated in internal tumor samples "? could be answered in seconds.

Important queries were saved so that they could be executed at any time.

Problems solved

  • Unnecessary table wrangling and table joins. Countless hours are spent consolidating data before important value generating analyses can begin. Using the query language teams were able to obtain answers within seconds rather than hours.

  • Steep coding learning curve typically associated with knowledge graphs. All team members were able to retrieve information from the knowledge graph without the use of code.

Challenge: Fact checking AI solutions

Natural language - Multi-omic GraphRAG

New AI tools are hitting the market each day, many of which fall victim to hallucination. The team was working towards an AI strategy and wanted a ground source of truth derived from data they could trust.

The team leveraged our natural language GraphRAG to converse with their sequencing data.

Problems Solved

  • Develop a ground source of truth to fact check their AI. Instead of relying solely on new AI tools that are prone to hallucination, the team was able to use our platform to fact check information against quantitative sequencing data within their graph.

By using the BioBox Platform our client was able to curate a custom data graph without the tedious pain points typically associated with knowledge graph management.

This enabled them to

  • Build a proprietary evidence ranking and prioritization system that led to the identification of 10 novel targets.

  • Prioritize 2 new indications for existing assets.

  • Curate a custom representation of disease biology.

  • Leverage machine learning algorithms to identify serendipitous data connections.

  • Fact check their internal AI solutions that often fall victim to hallucinations.

The BioBox data graph has given us a 360 view of our data. Data points that could have previously slipped through the cracks are now front and center. This has resulted in us spending significantly less time wondering if we made the right decisions in our discovery pipeline because we are leaving no stones unturned.

Director of Translational Biology


Interested in fact checking your AI solutions and conversing with your data? Send us an email at sales@biobox.io, we enjoy complex data challenges.

Interested in drug discovery, AI, and knowledge graphs? Subscribe to the BioBox Blog

Discussion about this podcast

BioBox Blog
BioBox Blog
Authors
Lauren Phillips