CHRIS QUIRK
Partnering AI and Science
As the scientific method evolved, it has served humanity well: Observe, develop a hypothesis, run an experiment, analyze the results, and then repeat. From the root of that basic formula, countless discoveries have blossomed because of the power inherent in the scientific method’s rigor.
In recent years, computer scientists have amplified that power, such that Artificial Intelligence now can increase the speed of the very deliberative scientific method by orders of magnitude. Scientists in the School of Computer Science and across CMU are taking full advantage. Attending new techniques and methods, these researchers are creating questions about how AI is changing the way we think about science, and how AI may offer new approaches to scientific inquiry.
As an AI researcher and leading figure in the development of computer vision, Martial Hebert, dean of the School of Computer Science, has watched with interest as scientists in myriad disciplines have put AI to work in their labs and inquiries. “AI is able to do a massive amount of data processing that no human could do in any reasonable amount of time, thereby providing conclusions and information that help scientists target what their next area of focus might be,” said Hebert. “It is absolutely transforming the way we look at scientific discovery.”
“For hundreds of years people did science the same way. Now we have this AI companion which can help us make scientific discoveries faster, quicker, cheaper and better,” said Barnabás Póczos, an assistant professor in the Machine Learning Department (MLD). “At some point when people write scientific papers, they may credit AI as a collaborator, because they may not have been able to make these scientific discoveries alone.”
But how is AI assisting? First and foremost, researchers are taking advantage of AI’s raw computational muscle. Rachel Mandelbaum, CMU professor of astrophysics and cosmology, is working on one of the most exciting projects in her field, the Legacy Survey of Space and Time (LSST). Powered by a 3,200-megapixel camera now being installed at the new Vera C. Rubin Observatory in Cerro Pachón, Chile, slated to open next year, the LSST will map the southern skies in a way astronomers could only dream previously. The survey will look at every point in the sky about a thousand times over 10 years. “It’s going to be like a 10-year color movie of the entire visible sky,” said Mandelbaum.
The LSST will find about 10 million objects in the sky that have changed — every single night. “So every night there will be 10 million alerts for that. It is going to produce an enormous data set,” Mandelbaum explained. Sorting all that data will demand AI-powered data processing. “Astronomers will have to figure out very quickly whether a change is something interesting that we should be following up with a different telescope, or whether it is a plane that flew overhead, and not so interesting.”
AI is indispensable to the success of the LSST and extracting salient information out of the almost boundless trove of data. “There are a lot of exciting opportunities for using AI for scientific discovery in astronomy and other fields, but we can’t use AI out of the box,” Mandelbaum said. “This needs to be a collaborative endeavor to find AI methods that work in the context of the scientific method.”
As an ardent player of Go, the ancient strategy board game with an almost incomprehensible number of possible game positions, Jeff Schneider, a research professor in the Robotics Institute (RI), recalls the watershed moment when a computer defeated international master Lee Sedol in 2016. “I’m a pretty poor Go player, but I truly thought that I would never see a computer that could beat me,” said Schneider. “For me it was mind blowing, not only that it could beat me, but beat the world champion.”
Today, Schneider’s work parlays AI in a bid to make nuclear fusion a reality. Fusion is commonly dubbed the holy grail of energy production, because of its ability to produce nearly limitless power at low cost. Fusion differs from nuclear fission, which splits atoms and is the process used in current nuclear energy technology. Rather, fusion melds atoms together, generating massive amounts of energy from tiny amounts of fuel. No greenhouse gasses are released, only helium, and the waste product tritium that fusion consumes and expels is far less radioactive and dangerous than the waste fission produces.
True nuclear fusion, long a dream of the scientific community, is the energy system of the sun and the stars, welding atoms together in their cores at tremendous temperatures and pressures. Recreating that environment on Earth is a supreme challenge to say the least, and one that Jeff Schneider has been working on diligently. “This is big science,” he said. If harnessed, fusion could supply the energy needs of the entire planet, without endangering the climate.
Schneider is seeking a fusion solution using a tokamak, a device that suspends a torus — a doughnut-shaped plasma of hydrogen — within it, using powerful magnetic fields. “We’re trying to get it to these extreme temperatures and pressures, when there isn’t anything that could physically confine the plasma,” he explained. “The magnetic field holds the plasma in place.” The challenge is that the dynamics of achieving that are complex and unstable, and there are billions of variables involved.
Given the high cost of tokamaks, researchers have to plan carefully and use their precious time allotments with extreme efficiency. Schneider and his colleagues now use a modified large language model, which has been loaded with decades worth of data and results from prior experiments run on the tokamak, to help fashion and sharpen their own tests. “It’s helping us build models and get insights using machine learning, and it’s really changing the way science is done,” Schneider said. “It’s a much more effective process and things are improving a lot.”
Martial Hebert, Dean of SCS
Barnabás Póczos, Assistant Professor in MLD
Rachel Mandelbaum, CMU Professor of Astrophysics and Cosmology
Jeff Schneider, Assistant Professor in RI
In the field of health sciences, as the number of dangerous infections that are resistant to many antibiotics rises — like Methicillin-resistant Staphylococcus aureus (MRSA) — the urgency increases to identify new drugs to combat them. Hosein Mohimani, associate professor in the Ray and Stephanie Lane Computational Biology Department, and Abhinav Adduri, a Ph.D. student in Computational Biology, have found success with the help of AI, targeting a particularly nasty bug known as Candida auris.
C. auris, a yeast resistant to many antifungals, preys particularly on the less healthy, and thus, often wreaks its havoc in healthcare facilities. In essence, antimicrobial resistance occurs when pathogens develop cross-resistance to the means that drugs employ to destroy them. “Maybe this drug attacks the cell wall. Then, if a bacteria can protect its cell wall, that renders a whole class of antimicrobials inactive,” Adduri said. “We then need new antimicrobials with novel mechanisms of action.” The high volume methods of AI virtual experimentation that the researchers built increase their odds of finding gaps in the armor of pathogens.
Mohimani and Adduri created AI tools to massively speed up drug research, and as a result, more quickly find antibiotics with a high probability of effectiveness. They rely on AI to suggest possible designs for drugs, in the process helping them focus their search in the most fruitful directions, eliminating vast swaths of likely fallow areas. “We generate data on organisms that give us hints about the capacity of certain microbes to produce interesting antibiotic molecules,” explained Mohimani. “Then we use AI methods to analyze that data and figure out what is the most promising.”
Mohimani sees nothing unusual in employing AI as a kind of scientific partner. “Ever since the discovery of penicillin, a lot of antibiotics that we are currently using in our medicine have been discovered by random luck,” he said. “We are here to discover novel antibiotics in a systematic way, by using high throughput methods to generate a lot of data, and then using AI algorithms to analyze it.” The human in the loop, so to speak, is still critical to fabricating the drugs. Mohimani and his colleagues enhance the probability of success for the compounds they identify by using existing natural materials as a base, rather than letting the algorithm construct its own antibiotic from scratch. “It makes them easier to synthesize,” Adduri explained. “AI algorithms don’t
really have a great notion of the pragmatic aspects of chemical synthesis ability.”
Andreas Pfenning, a genomic researcher, created a process for finding disease cures that is a kind of feedback loop, closing a virtuous circle of human-AI collaboration. Pfenning, an associate professor in the Ray and Stephanie Lane Computational Biology Department, works in comparative genomics finding new avenues to identify important information among the dense thicket of animal genomes, particularly neural functions, which could lead to therapies for Alzheimer’s disease or addiction.
Faced with the daunting amount of information in genomes, researchers looking for key segments have, quite logically, sought shortcuts. “Since the 1960s, if people are trying to identify whether or not a certain part of the genome is important or functional, they have used models of how the individual nucleotides of the genome evolve,” said Pfenning. Researchers typically focused their attention on genetic information passed on from generation to generation, as that which is preserved in a genome is likely important. Further researchers built mathematical models that presumed that if you are seeking a part of a genome related to a disease, you would find it in a similar location in the genome across different species.
Hosein Mohimani, Associate Professor in the Lane Computational Biology Department
Abhinav Adduri, Ph.D. Student in Computational Biology
Andreas Pfenning, Associate Professor in the Ray and Stephanie Lane Computational Biology Department
Tom Mitchell, Founders University Professor in MLD.
But what if those principles didn’t always hold? Rather than taking a mathematical approach, Pfenning applied machine learning, which unearthed connections in unlikely sections of genomes. In doing so, Pfenning identified previously unknown parts of the genome related to vocal learning and the capacity to mimic sounds. “This is the foundation of human speech,” Pfenning said. The traditional models were, in fact, missing important information. “A common theme in a lot of our work is trying to reframe traditional biological problems. The data that we collect can provide input to our algorithms. And then the algorithms can make predictions that we can go into the laboratory to test,” he said. “We still need to conduct scientific experimentation, and what I argue for is a tighter integration of AI and science, rather than AI coming in and taking over the scientific discipline of biology.”
Despite the accumulation of benefits, there remain questions and caveats around AI and research. A white paper published last year by The Royal Society, Britain’s leading scientific academy, found in its review of the use of AI in science and engineering that while the increasing adoption of AI applications in STEM fields shows great potential, policy and ethical questions, like the risks of potential discrimination inherent in data systems, have to be part of the broader conversation.
Others raise concerns about the “black box” nature of AI — the algorithms cannot at this time tell us how they arrived at a particular answer, or why they took a specific action. Hebert’s outlook remains pragmatic. “If you use AI to drive cars, you want to understand exactly what it’s doing so you can prove that it’s doing the right thing,” he said “In biology, if you have the AI predicting that particular key configuration would behave in a certain way, you would do experiments to verify that.”
There is also the problem of what Tom Mitchell, Founders University Professor in the Machine Learning Department, calls “monothink,” where the utility of successful AI tools can become its own enemy. “AlphaFold is the go-to tool for protein folding. And we all know it’s not perfect, but it’s way better than anything else that’s out there,” said Mitchell. “But if every scientist in the field is using the same AI tool, it’s kind of like if every researcher in a Ph.D. group were getting advice from the same advisor. If that advice is sometimes wrong, then everybody’s misled.”
In addition, due to the fragility of AI, and its propensity to produce hallucinations — false or fabricated results — Mitchell says we must recalibrate the way we think about computers. “For the first five decades of computer science, up until 2000 or so, it really was true that if a commercial system released some software, they kind of guaranteed that it worked. There were proofs of correctness, and we got into a mode of thinking of computers as something that would be perfect,” he said. “Now we’re starting to use computers in a very different way. ChatGPT doesn’t always work in the sense that it gives the perfect answer, just like people. We’re starting to use computers in a way much more similar to how we use our human colleagues, rather than as oracles that are infallible.” ■