Dodging AI and other computational biology dangers - Sanford Burnham Prebys
Institute News

Dodging AI and other computational biology dangers

AuthorGreg Calhoun
Date

August 13, 2024

Sanford Burnham Prebys scientists say that understanding the potential pitfalls of using artificial intelligence and computational biology techniques in biomedical research helps maximize benefits while minimizing concerns

ChatGPT, an artificial intelligence (AI) “chatbot” that can understand and generate human language, steals most headlines related to AI along with the rising concerns about using AI tools to create false “deepfake” images, audio and video that appear convincingly real.

But scientific applications of AI and other computational biology methods are gaining a greater share of the spotlight as research teams successfully employ these techniques to make new discoveries such as predicting how patients will respond to cancer drugs.

AI and computational biology have proven to be boons to scientists searching for patterns in massive datasets, but some researchers are raising alarms about how AI and other computational tools are developed and used.

“We cannot just purely trust AI,” says Yu Xin (Will) Wang, PhD, assistant professor in the Development, Aging and Regeneration Program at Sanford Burnham Prebys. “You need to understand its limitations, what it’s able to do and what it’s not able to do. Probably one of the simplest examples would be people asking ChatGPT about current events as they happen.”

(ChatGPT has access only to news information up to certain cutoff dates based on the training set of websites and other information used for the most current version. Thus, its awareness of current events is not necessarily current.)

“I see a misconception where some people think that AI is so intelligent that you can just throw data at an AI model and it will figure it all out by itself,” says Andrei Osterman, PhD, vice dean and associate dean of curriculum for the Graduate School of Biomedical Sciences and professor in the Immunity and Pathogenesis Program at Sanford Burnham Prebys.

Yu Xin (Will) Wang, PhD

Yu Xin (Will) Wang, PhD, is an assistant professor in the Development, Aging and Regeneration Program at Sanford Burnham Prebys.

“In many cases, it’s not that simple. We can’t look at these models as black boxes where you put the data in and get an answer out, where you have no idea how the answer was determined, what it means and how it is applicable and generalizable.”

“The very first thing to focus on when properly applying computational methods or AI methods is data quality,” adds Kevin Yip, PhD, professor in the Cancer Genome and Epigenetics Program at Sanford Burnham Prebys and director of the Bioinformatics Shared Resource. “Our mantra is ‘garbage in, garbage out.’”

Andrei Osterman, PhD

Andrei Osterman, PhD, is a professor in the Immunity and Pathogenesis Program at Sanford Burnham Prebys.

Once researchers have ensured the quality of their data, Yip says the next step is to be prepared to confirm the results.

“Once we actually plug into certain tools, how can we actually tell whether they are doing a good job or not?” asks Yip. “We cannot just trust them. We need to have ways to validate either experimentally or even computationally using other ways to cross-check the findings.”

Yip is concerned that AI-based research and computational biology are moving too fast in some cases, contributing to challenges reproducing and generalizing results.

“There are so many new algorithms, so many tools published every day,” adds Yip. “Sometimes, they are not maintained very well, and the investigators cannot be reached when we can’t run their code or download the data they analyzed.”

For AI and computational biology techniques to continue their rapid development, it is important for the scientific community to be responsible, transparent and collaborative in sharing data and either code or trained AI models so that studies can be reproduced to enhance trust as these fields grow.

Privacy is another potential breeding ground for mistrust in research using AI algorithms to analyze medical data, from electronic health records to insurance claims data to biopsied patient samples.

“It is completely understandable that members of the public are concerned about the privacy of their personal data as it is a primary topic I discuss with colleagues at conferences,” says Yip. “When we work with patient data, there are very strict rules and policies that we have to follow.”

Yip adds that the most important rule is for scientists to never re-identify the samples without proper consent, which means using algorithms to predict which patient provided certain data.

Kevin Yip, PhD

Kevin Yip, PhD, is a professor in the Cancer Genome and Epigenetics Program at Sanford Burnham Prebys.

Ultimately for Yip, using AI and computational methods appropriately—within their limitations and without violating patients’ privacy—is a matter of professional integrity for the owners and users of these emerging technologies.

“As creators of AI and computational tools, we need to maintain our code and models and make sure they are accessible along with our data. On the other side, users need to understand the limitations and how to make good use of what we create without overstepping and claiming findings beyond the capability of the tools.”

 “This level of shared responsibility is very important for the future of biomedical research during the data revolution.”


Programming in a Petri Dish, an 8-part series

How artificial intelligence, machine learning and emerging computational technologies are changing biomedical research and the future of health care

  • Part 1 – Using machines to personalize patient care. Artificial intelligence and other computational techniques are aiding scientists and physicians in their quest to prescribe or create treatments for individuals rather than populations.
  • Part 2 – Objective omics. Although the hypothesis is a core concept in science, unbiased omics methods may reduce attachments to incorrect hypotheses that can reduce impartiality and slow progress.
  • Part 3 – Coding clinic. Rapidly evolving computational tools may unlock vast archives of untapped clinical information—and help solve complex challenges confronting health care providers.
  • Part 4 – Scripting their own futures. At Sanford Burnham Prebys Graduate School of Biomedical Sciences, students embrace computational methods to enhance their research careers.
  • Part 5 – Dodging AI and computational biology dangers. Sanford Burnham Prebys scientists say that understanding the potential pitfalls of using AI and other computational tools to guide biomedical research helps maximize benefits while minimizing concerns.
  • Part 6 – Mapping the human body to better treat disease. Scientists synthesize supersized sets of biological and clinical data to make discoveries and find promising treatments.
  • Part 7 – Simulating science or science fiction? By harnessing artificial intelligence and modern computing, scientists are simulating more complex biological, clinical and public health phenomena to accelerate discovery.
  • Part 8 – Acceleration by automation. Increases in the scale and pace of research and drug discovery are being made possible by robotic automation of time-consuming tasks that must be repeated with exhaustive exactness.