Beware the Ouroboros of Biology.
A system struggling to break from consensus.
Much of the AI application ecosystem, from RAG to tool use to massive token context windows, is optimizing for a single objective: reducing response entropy by feeding the model more information. The more context you provide the more the distribution narrows.
This is a powerful system that produced the co-pilots that summarize papers in seconds and now, AI-powered platforms that run workflows and do bioinformatics analysis, and foundation models that excel at physics/chemistry tasks (e.g. Alphafold, Chai, Boltz). For problems where the “right” answer is derivable from available information, reducing entropy through context works beautifully.
Harvey, the AI for law, is one of the clearest examples of this. Many legal tasks reward fast convergence over a known body of rules, precedent, and language. It is hard to beat a system that can absorb a vast body of statutes, case law, and legal argument and retrieve it instantly. The rules are relatively legible, and much of the relevant context is available. This kind of problem rewards the ability to eliminate response entropy and converge quickly and precisely.
That is why so many knowledge industries are vulnerable to this type of disruption. And the biggest prize is pharmaceuticals. It is the Formula 1 of the knowledge economy. The implicit thesis was simple: instead of legal briefs, feed the system scientific literature, and it will learn biology.
The result has been a torrent of “AI Co-Scientist” and “Neurosymbolic AI” companies entering the most overhyped, overfunded, and epistemologically confused segment in vertical AI. And that confusion is a direct consequence of applying convergent tools to a fundamentally divergent problem.
There is a set of problems where entropy reduction breaks down entirely because it punishes convergence and rewards divergence. The highest-value decisions in drug discovery are exactly this kind of problem.
The decisions that matter, like which targets to pursue, patient populations to bet on, and which mechanisms to trust when the biology is ambiguous, are not made by converging on available information in literature. They are made by someone who has seen something others haven't, interpreted failure differently than the rest, or held a conviction that ran against consensus long enough to be proven right.
Alpha lives in the divergence.
And this is where the Ouroboros appears.
The Ouroboros of Biology.
Drug discovery relies on two principal factors: Asymmetric information and Differentiated Thinking. You win because you know something they don’t and you have smarter scientists who think outside the box. Because of these competitive dynamics, no one is incentivized to share their asymmetric knowledge, let alone publish it.
We train models on the published record of biology, then ask them to generate the next frontier of biological insight from that same record. That is the Ouroboros. A system trying to escape the limits of consensus by recursively consuming consensus. But the published record is not biology.
No one in the history of drug discovery made a drug because they read the literature better.
The scientific literature is a curated, systematically optimistic, self-referential record of what scientists were willing to report, what journals were willing to print, and what funding agencies were willing to support. Papers cite papers that cite papers that no one could reproduce. Positive results get published with a narrative. Negative results disappear into reports, decks, and private team memory.
The top scientific R&D teams know this already. Inside serious drug discovery organizations, this is almost banal. But for the outsiders building in this space, there is this tendency to over index on the value of literature as ground truth. Poor understanding of this reality is most obvious when vendors market their knowledge graph or foundation model to be trained on hundreds of millions of papers and abstracts (yes…even conference abstracts) as if sheer volume somehow makes the system better.
A More Structured Ouroboros.
“Fine. Literature alone is noisy. That’s why we use a knowledge graph to structure the claims. And we also integrate real-world and proprietary customer data into the graph.”
This is directionally correct but hides an insidious trap.
The value of a knowledge graph is the ontology. The ontology is not neutral, it is judgment. Every definition in the schema encodes a view of the world. What counts as the same thing? What makes things different? What gets linked? What gets collapsed? What do words mean?
An ontology is a scientific world view made operational.
The most important question is: whose graph is it?
If it is not your ontology, it is not your knowledge graph.
If it is not your knowledge graph, it is not your science.
If it is not your science, it is not your decision.
The moment a vendor says they “integrate customer data into their graph”, you should ask what scientific judgment you are implicitly outsourcing because you’ve just relinquished control. They are deciding how your internal assays map to external biology. How your translational signals get normalized. How your negative data gets represented. What counts as corroboration. What counts as contradiction. What counts as enough evidence to connect one claim to another.
You cannot have this both ways. You cannot claim that your proprietary ontology-driven evidence graph is valuable and ingesting customer data into that proprietary graph. You are either infrastructure that allows the customer to instantiate, govern, and evolve their own scientific worldview. Or you are a scientific intermediary that aggregates, reshapes, and ultimately substitutes for that worldview.
A simple test is to ask the vendor to change the underlying ontology to accommodate your internal definitions. See what happens.
The Contradiction at the Center
This is why so many of these platforms feel impressive in demo mode and suspicious in principle. They can absolutely help teams search faster, summarize faster, traverse claims faster, and maybe even operate workflows faster. But as soon as you diverge from consensus, everything starts to break.
A pharma company does not win because it has access to information. Everyone has access to information. It wins because it developed a way of seeing that information differently, grounded in data others do not have and judgment others cannot replicate. This is the reasoning layer that cannot be outsourced without consequence. It is where scientists encode scientific reasoning and capture the differentiated thinking a team uses to decide what to believe, what to ignore, what to test, and where to place conviction under uncertainty.
How do we get out of this cycle?
Reset on expectations. You are NOT going to get a Harvey for drug discovery. It’s a different problem class (convergent vs. divergent).
Stop this. These types of posts are so inflammatory I need a shot of Dupixent.
Final Thoughts.
Drug discovery does not reward systems that converge fastest on the published record. It rewards teams that know something others do not, interpret evidence differently, and maintain conviction under uncertainty.
That means the future of AI in biology will not belong to whoever reads the most papers or structures the most claims. It will belong to whoever helps scientific organizations encode their own ontology, their own context, and their own judgment without surrendering them to a vendor-controlled worldview.
Otherwise, the snake keeps eating its tail.
BioBox is the Decision Operating System for AI-driven pharma.




