Graph Pattern Series: Indication selection.

Part 1 - Introduction, Patient Recruitment with Graphs

Jan 22, 2025

Note: This is the first of a multi-part series on how some of the best biotechs make strong data-driven decisions using BioBox knowledge graphs to set up their pipelines for success.

Introduction

Indication selection is the process of selecting and ranking potential diseases or conditions (indications) that a drug candidate could target, based on scientific, medical, regulatory, and commercial factors. This strategic decision requires participation from leadership across multiple functions in the company.

By the time we engage with biopharma, they already have a general idea of what therapeutic areas (e.g. CNS, liver disease, auto-immune, etc.) to focus on. Usually, this is set in the company’s mission, a specific area of expertise, and/or something their platform is uniquely positioned to solve for. The asset’s stage of development is also important to consider. The starting points for indication selection strategy are different based on the type of biotech company. Because of these nuances, it was not surprising to find that each customer had a slightly different perspective on how diseases should be ranked. As a result, we’ve ingested dozens of different data sources and built many knowledge graph variants for customers.

In this blog post, we’ll share a blueprint of best practices for modeling data inside a knowledge graph to help biotech teams evaluate their competitive positioning based on feedback from our customers.

Organizing Principles

There are 2 entry points for this discussion: Diseases and Processes. This distinction is important when describing treatment modalities. For example, in immuno-oncology, one of the desired outcomes is to activate T-cell response. These therapies may have broad application across a portfolio of diseases. It is also useful to connect Phenotypes with diseases to improve resolution.

Commercial Attractiveness

Working with business and medical affairs teams broadened my perspective on how much multi-modal data goes into a indication decision. But it basically boils down to this equation:

\(\frac{ \text{Value if it works} \bullet \text{Probability of success} }{ \text{Time to value} \bullet \text{Money needed to find out} }\)

The goal is to max out this ratio. Let’s talk about some strategies that we’ve seen work.

Patient Recruitment

Your ability to recruit enough patients to satisfy the study criterion is a vital consideration. High competition for patients will make your trial run longer, delaying the time to value. Overextending the trial duration means more staff on payroll. The overall effect is the denominator increases, by a lot, and hurts your commercial attractiveness.

Several factors contribute to patient recruitment potential:

Pick an indication that has high prevalence = larger patient supply
Large unmet need
Competing trials recruiting for the same patient populations

Everyone knows about 1 and 2. There’s not much that can be done there to give you an advantage. But it turns out we can use graphs to avoid setting up competing trials. Every clinical trial will list their inclusion and exclusion criteria for participants. Customers we’ve worked with extracted the unstructured data into a structured set that we then model into the graph.

After repeating this for the exclusion criterion, we can build a graph representation model (mono-partite projection) that calculates similarities of Clinical Trial nodes to help scaffold our own inclusion criterion that will deliver on our scientific/regulatory goals, but minimizes the similarity to other actively recruiting trials.

At decision-time, the process basically works as follows:

Enumerate your ideal inclusion/exclusion criteria that will maximize your chance of the trial passing (we’ll cover this topic in part 2 of the series)
Run the similarity calculations
Work with medical affairs and ClinOps team to drop/augment a criteria
Re-run the similarity calculations
Repeat until either ClinOps/medical affairs denies another change or similarity scores reaches a global minima

Bonus: Geo-spatial data to minimize site overlap

This next aspect of the graph was not something we expected to be done in our discipline, but yielded information that was remarkably strategic, even outside of drug development (more on this later). The problem was that even when you have the right indication, the right inclusion/exclusion criteria, you still have to physically recruit the patients and bring them to the sites. While the global disease burden is high, if the local recruitment location is highly saturated with similar trials, you’ll still face a recruitment challenge.

Each clinical trial that is active and/or recruiting is required to disclose their study sites. The API response for study locations conveniently includes the geospatial coordinates (longitude, latitude). This adds a new dimension to use in our decision making. We can evaluate how far apart the trials are from a clinical characteristic perspective and we can calculate the physical distance between the trial locations.

Use-cases outside of drug discovery

I recently reconnected with an old colleague who is now in a leadership role at a major Canadian hospital. I shared with him the geo-spatial strategy which brought up a use-case I had not previously considered. Attracting more clinical trials to your research site.

Research hospitals benefit greatly from conducting clinical trials, it’s a significant source of revenue. However, there is an activation time required at the research site to prepare the clinical research operations to execute successfully. It is a very reactive type of work where lucrative opportunities are passed because the timing didn’t align. But if you knew what the trends are for the upcoming trials that are going to start recruiting and have a solid mapping of their clinical characteristics, you can preemptively activate your research operations so that when recruitment opens, your site is ready to go. Furthermore, you can use the spatial data to reverse engineer which trials are competing for the same populations within the same locations and prep/offer your trial site to satisfy their recruitment needs and drive revenue into your institution.

What’s Next?

Upcoming in this series, we’ll continue to focus on indication selection patterns. Specifically, we’ll be sharing best practices to increase the "probability of success" variable in the equation such as mechanism of action analysis and biomarker mapping.

BioBox is the knowledge infrastructure for modern biopharma research, built for drug hunters who need to integrate multi-modal data, engineer knowledge, and test hypotheses at scale. To learn more, please visit our website at https://biobox.io or reach out to one of the team members.

Discussion about this post

Ready for more?