AI Chatbots Just Outperformed Human Teams in Analyzing Medical Data

AI Technology Big Data Analysis
AI just helped crack a complex pregnancy data challenge in record time — and it might reshape how medical research gets done. Credit: Shutterstock

Generative AI tools stunned researchers by building accurate preterm birth prediction models far faster than human teams — sometimes even outperforming them. The breakthrough suggests AI could dramatically accelerate medical discoveries and improve care for vulnerable newborns.

In an early real-world test of artificial intelligence in health research, scientists at UC San Francisco and Wayne State University found that generative AI tools could analyze large medical datasets dramatically faster than traditional research teams. In some cases, the AI systems even produced better results than computer scientists who had spent months carefully reviewing the same data.

Researchers created a direct comparison. Some teams relied on human expertise alone, while others combined scientists with AI assistance. All were asked to tackle the same challenge: predict preterm birth using data from more than 1,000 pregnant women.

Even a junior pair consisting of a UCSF master’s student, Reuben Sarwal, and a high school student, Victor Tarca, successfully built prediction models with the help of AI. They were able to generate functional computer code within minutes — work that would typically require experienced programmers several hours or even days to complete.

The key advantage of generative AI is its ability to generate analytical code from short, highly technical prompts. Not every system performed well. Only 4 of the 8 AI chatbots generated usable code. Still, those that succeeded did not require large teams of experts to guide them.

Because of this efficiency, the junior researchers were able to run experiments, verify their results, and submit their findings to a scientific journal within a few months.

“These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines,” said Marina Sirota, PhD, a professor of Pediatrics who is the interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF and the principal investigator of the March of Dimes Prematurity Research Center at UCSF. “The speed-up couldn’t come sooner for patients who need help now.”

Sirota is co-senior author of the study, which was published today (February 17) in Cell Reports Medicine.

Why Preterm Birth Prediction Matters

Faster data analysis could lead to improved diagnostic tools for preterm birth, which is the leading cause of newborn death and a major contributor to long-term motor and cognitive disabilities. In the United States, about 1,000 babies are born prematurely every day.

Despite its impact, scientists still do not fully understand what triggers preterm birth. To search for answers, Sirota’s group collected microbiome data from about 1,200 pregnant women across nine separate studies, tracking each pregnancy through delivery.

“This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers,” said Tomiko T. Oskotsky MD, co-director of the March of Dimes Preterm Birth Data Repository, associate professor in UCSF BCHSI, and co-author of the paper.

However, the sheer volume and complexity of the data made it difficult to analyze. To address this, the team sought outside help through a global competition known as DREAM (Dialogue on Reverse Engineering Assessment and Methods).

Sirota co-led one of three DREAM pregnancy challenges, focusing on vaginal microbiome data. More than 100 research groups worldwide competed to design machine learning algorithms capable of identifying patterns linked to preterm birth. Most teams completed the challenge within the allotted three months. Yet compiling the findings and publishing the results required nearly two years.

Testing Generative AI on Pregnancy Data

To determine whether AI could accelerate the process, Sirota’s team partnered with researchers led by Adi L. Tarca, PhD, co-senior author and professor in the Center for Molecular Medicine and Genetics at Wayne State University in Detroit, MI. Tarca had led the other two DREAM challenges, which focused on improving methods for estimating pregnancy stage.

The combined team asked eight AI systems to independently build algorithms using the same datasets from the three DREAM challenges, without human coding support.

Each AI tool received carefully designed natural language prompts. Similar to how ChatGPT operates, the systems were instructed in plain language but with precise guidance so they would analyze the medical data in ways comparable to the original DREAM teams.

The objective was the same as in the competition: evaluate vaginal microbiome data for indicators of preterm birth and examine blood or placental samples to estimate gestational age. Pregnancy dating is almost always an estimate, yet it plays a critical role in guiding medical care. When the estimate is inaccurate, clinicians may struggle to prepare for labor at the right time.

After running the AI-generated code on the datasets, researchers found that 4 of the 8 AI tools produced prediction models that performed as well as those created by the DREAM teams. In some instances, the AI models performed even better. The entire generative AI project, from initial concept to journal submission, was completed in just six months.

Researchers caution that AI systems can still produce misleading results, and human oversight remains essential. The technology does not replace scientific expertise. However, by rapidly processing enormous datasets, generative AI may allow researchers to spend less time debugging code and more time interpreting findings and asking better scientific questions.

“Thanks to generative AI, researchers with a limited background in data science won’t always need to form wide collaborations or spend hours debugging code,” Tarca said. “They can focus on answering the right biomedical questions.”

Reference: 17 February 2026, Cell Reports Medicine.

Authors: UCSF authors are Reuben Sarwal; Claire Dubin; Sanchita Bhattacharya, MS; and Atul Butte, MD, PhD. Other authors are Victor Tarca (Huron High School, Ann Arbor, MI); Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University); Gaurav Bhatti (Wayne State University); and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).

Funding: This work was funded by the March of Dimes Prematurity Research Center at UCSF, and by ImmPort. The data used in this study was generated in part with support from the Pregnancy Research Branch of the NICHD.

Never miss a breakthrough: Join the SciTechDaily newsletter.
Follow us on Google and Google News.