Fact Check your AI : Multi-Omic GraphRAG
If you could converse with your sequencing data, what would you ask it ?
As new AI solutions hit the market every day, data scientists are forced to discern the truth from hallucinations.
Knowledge Graphs
It is imperative to have a central source of truth to fact check the information provided by LLMs. One way in which data teams are tackling this challenging is through the use of knowledge graph. In short, a knowledge graph is a semantic representation of data points and their relationships. When it comes to drug discovery, think of it as a GPS for understanding disease biology.
At BioBox, we provide therapeutic teams with the tools to manage and curate a custom knowledge graph composed of private multi-omic sequencing data and public knowledge bases.
For a deep dive on how we leverage them to understand complex biological relationships check out this article below.
Fact Checking LLMs with GraphRAG
It’s no secret that knowledge graphs are typically hard to traverse - no one likes an ontology hairball. Not everyone wants to write in Cypher or memorize their entire graph schema. At BioBox, we have built a suite of tools to make extracting valuable information from the graph as easy as possible, one of which is GraphRAG.
RAG (Retrieval-Agumented Generation) is a framework used to improve the accuracy of information provided by LLMs. It improves LLM accuracy by retrieving information from external sources. This provides the LLM with up to date information grounded on the basis of relevant data rather than solely relying on the data the LLM was trained with. GraphRAG take this a step further, and retrieves information from a knowledge graph. GraphRAG enables teams to use natural language and obtain information directly from their knowledge graph.
Last week, data scientists at Microsoft published GraphRAG to github, enabling users to leverage an LLM to extract knowledge from a collection of proprietary text documents.
At BioBox we are excited to announce GraphRAG support for our multi-omic sequencing based graphs. Instead of spending hours wrangling data and fighting table.joins teams can use natural language and obtain information directly from their graph within seconds.
Check out the demo below.
Our bread and butter is heterogeneous sequencing based data, however we support a wide variety of knowledge bases including NIH clinical trials, Reactome, Alliance Genome, OpenTargets and more.
This allows for questions such as
What are the genes targeted by drugs used in clinical trials that study the disease Renal cell carcinoma and have an overall status of terminated?
What are the biological processes that genes that drugs act on that have a clinical precedence for the disease Renal cell carcinoma are involved in?
Which variants put you at risk for Renal cell carcinoma?
We have been developing a closed beta with several therapeutic teams. Over the next few months we will be continuing to improve our GraphRAG to support more complex multi-hop queries and will be releasing it to all users.
Interested in fact checking your AI solutions and conversing with your data? Send us an email at support@biobox.io, we enjoy complex data challenges.