The impact of artificial intelligence (AI) is all around us, from virtual assistants and email spam filters to personal health devices and drug development. AI is an incredibly broad umbrella term simply meaning machines, particularly computers, that mimic human response. We incorporate a branch of AI, called machine learning, into our platform.
Machine learning systems are trained to comb through large amounts of data to detect patterns, predict outcomes, and make decisions with minimal human intervention. But these methods are not new, so why all the hype now? Acceleration in computing power, namely graphical processing units (GPUs), coupled with vast data collections, makes machine learning a manageable task.
Machine learning excels at interpolation, meaning predicting images or molecules similar to the training set. Drug hunters often need to extrapolate beyond characterized chemical space to find new molecules. For example, drug hunters have characterized fewer than 1012 drug-like molecules, yet chemical space includes more than 1060 molecules.1 To put this in perspective, the entire ocean contains about 1047 water molecules and a drop of water contains 1021 molecules. Using machine learning to predict properties of 1060 molecules from a training set with fewer than 1012 molecules risks the equivalent of ‘learning’ there are no fish in the ocean from analyzing one droplet of ocean water! Our molecular training set is not complete.
Given the vast size of chemical space, it is unrealistic to assume that scientists can generate data to complete the training sets that would be required–it’s an insurmountable problem.
Physics-based models aim to generate specialized data-sets that allow drug hunters to move about and explore uncharted chemical space despite the limited compounds humans have characterized. Drug hunters leverage Schrödinger’s physics-based models to calculate training sets grounded in physical chemistry principles. Armed with such novel virtual training sets, drug hunters can then use machine learning to plot a course through vast chemical space to discover novel molecules with superior properties to generate better quality drug candidates faster.