A smarter way to streamline drug discovery

Adam Zewe | MIT News • June 17, 2024

The use of AI to streamline drug discovery is exploding. Researchers are deploying machine-learning models to help them identify molecules, among billions of options, that might have the properties they are seeking to develop new medicines.

But there are so many variables to consider — from the price of materials to the risk of something going wrong — that even when scientists use AI, weighing the costs of synthesizing the best candidates is no easy task.

The myriad challenges involved in identifying the best and most cost-efficient molecules to test is one reason new medicines take so long to develop, as well as a key driver of high prescription drug prices.

To help scientists make cost-aware choices, MIT researchers developed an algorithmic framework to automatically identify optimal molecular candidates, which minimizes synthetic cost while maximizing the likelihood candidates have desired properties. The algorithm also identifies the materials and experimental steps needed to synthesize these molecules.

Their quantitative framework, known as Synthesis Planning and Rewards-based Route Optimization Workflow (SPARROW), considers the costs of synthesizing a batch of molecules at once, since multiple candidates can often be derived from some of the same chemical compounds.

Moreover, this unified approach captures key information on molecular design, property prediction, and synthesis planning from online repositories and widely used AI tools.

Beyond helping pharmaceutical companies discover new drugs more efficiently, SPARROW could be used in applications like the invention of new agrichemicals or the discovery of specialized materials for organic electronics.

“The selection of compounds is very much an art at the moment — and at times it is a very successful art. But because we have all these other models and predictive tools that give us information on how molecules might perform and how they might be synthesized, we can and should be using that information to guide the decisions we make,” says Connor Coley, the Class of 1957 Career Development Assistant Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science, and senior author of a paper on SPARROW.

Coley is joined on the paper by lead author Jenna Fromer SM ’24. The research appears today in Nature Computational Science.

Complex cost considerations

In a sense, whether a scientist should synthesize and test a certain molecule boils down to a question of the synthetic cost versus the value of the experiment. However, determining cost or value are tough problems on their own.

For instance, an experiment might require expensive materials or it could have a high risk of failure. On the value side, one might consider how useful it would be to know the properties of this molecule or whether those predictions carry a high level of uncertainty.

At the same time, pharmaceutical companies increasingly use batch synthesis to improve efficiency. Instead of testing molecules one at a time, they use combinations of chemical building blocks to test multiple candidates at once. However, this means the chemical reactions must all require the same experimental conditions. This makes estimating cost and value even more challenging.

SPARROW tackles this challenge by considering the shared intermediary compounds involved in synthesizing molecules and incorporating that information into its cost-versus-value function.

“When you think about this optimization game of designing a batch of molecules, the cost of adding on a new structure depends on the molecules you have already chosen,” Coley says.

The framework also considers things like the costs of starting materials, the number of reactions that are involved in each synthetic route, and the likelihood those reactions will be successful on the first try.

To utilize SPARROW, a scientist provides a set of molecular compounds they are thinking of testing and a definition of the properties they are hoping to find.

From there, SPARROW collects information on the molecules and their synthetic pathways and then weighs the value of each one against the cost of synthesizing a batch of candidates. It automatically selects the best subset of candidates that meet the user’s criteria and finds the most cost-effective synthetic routes for those compounds.

“It does all this optimization in one step, so it can really capture all of these competing objectives simultaneously,” Fromer says.

A versatile framework

SPARROW is unique because it can incorporate molecular structures that have been hand-designed by humans, those that exist in virtual catalogs, or never-before-seen molecules that have been invented by generative AI models.

“We have all these different sources of ideas. Part of the appeal of SPARROW is that you can take all these ideas and put them on a level playing field,” Coley adds.

The researchers evaluated SPARROW by applying it in three case studies. The case studies, based on real-world problems faced by chemists, were designed to test SPARROW’s ability to find cost-efficient synthesis plans while working with a wide range of input molecules.

They found that SPARROW effectively captured the marginal costs of batch synthesis and identified common experimental steps and intermediate chemicals. In addition, it could scale up to handle hundreds of potential molecular candidates.

“In the machine-learning-for-chemistry community, there are so many models that work well for retrosynthesis or molecular property prediction, for example, but how do we actually use them? Our framework aims to bring out the value of this prior work. By creating SPARROW, hopefully we can guide other researchers to think about compound downselection using their own cost and utility functions,” Fromer says.

In the future, the researchers want to incorporate additional complexity into SPARROW. For instance, they’d like to enable the algorithm to consider that the value of testing one compound may not always be constant. They also want to include more elements of parallel chemistry in its cost-versus-value function.

“The work by Fromer and Coley better aligns algorithmic decision making to the practical realities of chemical synthesis. When existing computational design algorithms are used, the work of determining how to best synthesize the set of designs is left to the medicinal chemist, resulting in less optimal choices and extra work for the medicinal chemist,” says Patrick Riley, senior vice president of artificial intelligence at Relay Therapeutics, who was not involved with this research. “This paper shows a principled path to include consideration of joint synthesis, which I expect to result in higher quality and more accepted algorithmic designs.”

“Identifying which compounds to synthesize in a way that carefully balances time, cost, and the potential for making progress toward goals while providing useful new information is one of the most challenging tasks for drug discovery teams. The SPARROW approach from Fromer and Coley does this in an effective and automated way, providing a useful tool for human medicinal chemistry teams and taking important steps toward fully autonomous approaches to drug discovery,” adds John Chodera, a computational chemist at Memorial Sloan Kettering Cancer Center, who was not involved with this work.

This research was supported, in part, by the DARPA Accelerated Molecular Discovery Program, the Office of Naval Research, and the National Science Foundation.

A collage of four pictures of a yellow robot dog.
By Alex Shipps | MIT CSAIL August 8, 2024
A new algorithm helps robots practice skills like sweeping and placing objects, potentially helping them improve at important tasks in houses, hospitals, and factories.
A man wearing glasses and a blue shirt is smiling for the camera.
By Sara Feijo | MIT Open Learning August 8, 2024
Leveraging more than 35 years of experience at MIT, Bertsimas will work with partners across the Institute to transform teaching and learning on and off campus.
Two men are standing next to each other in front of a table with a robot on it.
By Rachel Gordon | MIT CSAIL July 31, 2024
CSAIL researchers introduce a novel approach allowing robots to be trained in simulations of scanned home environments, paving the way for customized household automation accessible to anyone.
A bunch of green thermometer on a pink background.
By Adam Zewe | MIT News July 31, 2024
More efficient than other approaches, the “Thermometer” technique could help someone know when they should trust a large language model.
A bunch of dice are flying in the air in a dark room.
By Adam Zewe | MIT News July 24, 2024
Introducing structured randomization into decisions based on machine-learning model predictions can address inherent uncertainties while maintaining efficiency.
A computer generated image of a brain on a motherboard.
By Rachel Gordon | MIT CSAIL July 23, 2024
MAIA is a multimodal agent that can iteratively design experiments to better understand various components of AI systems.
A computer generated image of a molecule on a green background
By David L. Chandler | MIT News July 23, 2024
Analysis and materials identified by MIT engineers could lead to more energy-efficient fuel cells, electrolyzers, batteries, or computing devices.
A hand is touching a screen with its finger.
By Adam Zewe | MIT News July 23, 2024
A new study shows someone’s beliefs about an LLM play a significant role in the model’s performance and are important for how it is deployed.
A nurse is looking at a computer screen while a woman is getting a mammogram.
By Adam Zewe | MIT News July 22, 2024
The model could help clinicians assess breast cancer stage and ultimately help in reducing overtreatment.
A grid of colorful balls connected to each other on a white background.
By Poornima Apte | Department of Materials Science and Engineering July 18, 2024
An MIT team uses computer models to measure atomic patterns in metals, essential for designing custom materials for use in aerospace, biomedicine, electronics, and more.
More Posts