The whack-a-mole game is depicted, representing the need to balance multiple different properties in the process of drug discovery and design.

The Role of AI and Physics in the Future of Drug Discovery

BY Ramy FaridJun 14, 2022

Drug hunters face enormous challenges in their efforts to bring new treatments to patients. The traditional process of designing molecules that have the right properties to become safe and effective drug candidates takes approximately five years, requires thousands of molecules to be synthesized in the laboratory for preclinical testing, and costs many millions of dollars. And after all that effort, more than two-thirds of drug discovery programs fail to even deliver a small molecule development candidate suitable to take forward into clinical testing.

The long timeline, hefty price tag and high failure rate is due to the fact that many of the most important molecular properties, such as potency, selectivity, and solubility, are anti-correlated, meaning that it is difficult to design for one drug property (eg, potency) without negatively impacting another desirable property (eg, solubility).

It’s a “whack-a-mole” problem.

In an ideal world, we would be able to list out every possible molecule on a computer, feed these into an algorithm that computes every single molecular property with perfect accuracy, and then choose the best molecule to advance into the clinic. This is our North Star.

The process of using machine learning and physics-based methods in drug discovery is depicted.

Predicting Molecular Properties

There are two distinct approaches being used within drug discovery that are making progress predicting an increasing number of molecular properties: artificial intelligence (AI)/machine learning and physics-based methods.

AI/Machine Learning

AI is a broad umbrella term simply meaning machines, particularly computers, that perform tasks typically conducted by humans, such as autonomously driving a car and identifying faces in a crowd. The branch of AI involved in drug discovery and development is called machine learning. Machine learning systems are trained to comb through large amounts of existing data, called a training set, to detect patterns, predict outcomes, and make decisions with minimal human intervention.

In the case of drug discovery, machine learning models comb through the structures of existing compounds with known molecular properties. These models excel at interpolation, meaning they can only be effectively used to predict the properties of molecules that are similar to those in the training set. The predictions become unreliable as the molecules become dissimilar. For example, a model trained to identify cats in photos cannot identify any other animal species.

This is the inherent limitation of AI/machine learning.

Drug hunters often need to extrapolate beyond characterized chemical space to find new molecules. Drug hunters have experimentally characterized far fewer than 200 million drug-like molecules, yet chemical space includes more than 10⁶⁰ molecules. To put this into perspective, the entire ocean contains about 10⁴⁷ water molecules and a drop of water contains 10²¹ molecules. Using machine learning to predict properties of 10⁶⁰ molecules from a training set with fewer than 200 million molecules risks the equivalent of ‘learning’ there are no fish in the ocean from analyzing one droplet of ocean water!

Molecular training sets are incomplete. And given the vast size of chemical space, it is unrealistic to assume that scientists can generate the data needed to complete the training sets that would be required–it’s an insurmountable challenge.

At Schrödinger, we recognize that developing a completely general approach to predicting molecular properties is not a problem amenable to machine learning. It’s a physics problem.

Physics-based Methods

The other approach for predicting molecular properties is to use rigorous, first-principles methods, also referred to as physics-based methods, that simulate molecular motion at the atomic level. These methods are very powerful in that they can capture all of the complex physics involved in the binding of a molecule to a protein to help predict their molecular properties without the need for existing data.

Predicting how a compound will bind to its target is very complex. For example, the compound and protein will need to adopt a certain shape, the water molecules will have to vacate the binding site, and then the compound will have to bind to the protein. Binding affinity can be predicted using free energy perturbation (FEP) calculations, which fully captures the complexity of the physics of these interactions. Such calculations also require the structure and the small molecule binding site of the protein to be known, which in some cases has not yet been experimentally determined.

However, these calculations are computationally expensive. Given that many desirable molecular properties are anti-correlated, billions of molecules need to be explored to find the most promising drug candidate. There are currently not enough computers in the world to run that many FEP calculations in a reasonable amount of time. This is a limitation of physics-based methods.

Game Changer

AI/machine learning and physics-based methods on their own have limitations that are insurmountable. But when we use them together–by combining the speed of machine learning with the accuracy of physics-based methods–we overcome these limitations. It is by marrying these two approaches that we are advancing the field of drug discovery.

Here is how it works: A computer algorithm builds one billion molecules, for example. We then take a random sampling of only 1,000 of these molecules and compute molecular properties using our free energy perturbation software (FEP+). This step is the equivalent of doing a year’s worth of laboratory work in one day, at a fraction of the cost. We then use the data for those 1,000 molecules to build an approximate machine-learning model that can then be applied to the entire one billion molecules. Using this model, we can score each molecule and take the top 5,000 that best fit our criteria. We then run FEP+ on these 5,000 molecules, which will result in approximately 10 that will be synthesized in the laboratory. These physics-based methods are so accurate that eight out of the 10 molecules will, on average, have the desired molecular property.

The drug discovery process is depicted, through the approach of physics, machine learning, and enterprise informatics.

Multiple cycles of this workflow were used to advance our internal MALT1 program. For this program, in total 8.2 billion molecules were prioritized using a physics-based trained machine learning model, approximately 12,000 molecules were analyzed using FEP+, and 78 molecules were synthesized and tested in the lab, one of which became our development candidate that is expected to soon enter clinical development. The identification of the development candidate molecule took only about 10 months.

The drug discovery process is described, from the first step of exploring 8.2 billion compounds with machine learning, to the final step of choosing a lead candidate.

Where is all this headed? There continues to be exponential growth in the power of computers, the number of proteins for which we know the structure, the number of compounds we can enumerate computationally, and the number of molecular properties that we can accurately predict. Together, these advancements will make the “whack-a-mole” problem a thing of the past.

Ramy Farid

Ramy Farid, Ph.D., is president, chief executive officer, and a member of the board of directors. He joined Schrödinger in 2002 and helped advance the company’s computational platform and drug discovery portfolio while assuming positions of increasing responsibility before becoming CEO in 2017. Dr. Farid has played a key role in implementing major strategic initiatives, including more than 40 research collaborations and joint ventures, and led the company through its initial public offering in 2020. Dr. Farid currently serves on the board of directors of Ajax Therapeutics, ShouTi, and Oak Hill Bio. Previously, he has served on the board of Nimbus Therapeutics, a biotechnology company he helped found in 2009. He also served on the board of directors of Morphic Therapeutic, and he currently serves on Morphic’s scientific advisory board. Dr. Farid began his career in academia and was an assistant professor in the chemistry department at Rutgers University. He was a National Institutes of Health postdoctoral fellow in the Department of Biochemistry and Biophysics at the University of Pennsylvania and received his doctorate degree in chemistry from Caltech. He is an author on 70 peer reviewed publications.

Sign Up Today

Up Next

Life Science Nature’s Slingshot: The Coming of Age of Antibody-Based Therapeutics

A hand holding a slingshot aiming at a bullseye is shown to represent the development of antibody-based therapeutics.

Antibodies are specialized protein molecules naturally made by our immune system to help fight off infections. Like a well-aimed slingshot, an antibody is capable of hitting and neutralizing the intended target with astonishing precision. Vaccine-induced antibodies and laboratory-produced monoclonal antibodies (mAbs) have transformed healthcare for millions of individuals suffering from serious and life-threatening diseases, including

BY Eliud Oloo