Head-to-head against AI, pharmacy students won

Aug. 11, 2025

By Steve Benowitz, U of A Office of Research and Partnerships

A study showed that when compared with students, ChatGPT 3.5 was less likely to correctly answer questions on therapeutics exams focused on clinical applications and cases.

person using laptop with the words “ChatAI” on a black screen with a text prompt — A recent study found that University of Arizona students outperformed artificial intelligence on therapeutics exams, which helps inform not only education efforts but also AI use for health care knowledge.

Photo by tolgart via Getty Images

Students pursuing a Doctor of Pharmacy degree routinely take – and pass – rigorous exams to prove competency in several areas. Can ChatGPT accurately answer the same questions? A new study by University of Arizona R. Ken Coit College of Pharmacy researchers said no, it can’t.

Portrait of Brian Erstad, PharmD, standing in front of awards on a wall — Brian Erstad, PharmD, is the interim dean and a professor at the R. Ken Coit College of Pharmacy.

Photo by Kris Hanning, U of A Office of Research and Partnerships

Researchers found that ChatGPT 3.5, a form of artificial intelligence, fared worse than PharmD students in answering questions on therapeutics examinations that ensure students have the knowledge, skills, and critical thinking abilities to provide safe, effective and patient-centered care.

ChatGPT was less likely to correctly answer application-based questions (44%) compared with questions focused on recall of facts (80%). It also was less likely to answer case-based questions correctly (45%) compared with questions that weren’t focused on patient cases (74%). Overall, ChatGPT answered only 51% of the questions correctly.

The results provide additional insights into the uses and limitations of the technology and may also prove valuable in the development of pharmacy exam questions. The findings appear in the journal Currents in Pharmacy Teaching and Learning.

“AI has many potential uses in health care and education, and it’s not going away,” said Christopher Edwards, PharmD, an associate clinical professor of pharmacy practice and science. “One of the things we were hoping to answer with the study was if students wanted to use AI on an exam, how would they perform? I wanted to have data to show the students and tell them they can do well in the exams by studying hard and they don’t necessarily need these tools.”

A secondary goal was to find out what kinds of questions AI would struggle with. Coit College of Pharmacy Interim Dean Brian Erstad, PharmD, wasn’t surprised that ChatGPT did better with straightforward multiple choice and true-false questions and was less successful with application-based questions.

Portrait of Christopher Edwards, PharmD — Christopher Edwards, PharmD, is an associate clinical professor of pharmacy practice and science at the Coit College of Pharmacy.

Photo by Kris Hanning, U of A Office of Research and Partnerships

“The kinds of places where evidence is limited and judgment is required, which is often in a clinical setting, was where we found the technology somewhat lacking,” he said. “Ironically those are the kinds of questions clinicians are always facing.”

Edwards, Erstad, and Bernadette Cornelison, PharmD, an associate professor of pharmacy practice and science, evaluated answers to 210 questions from six exams in two pharmacotherapeutics courses that are part of the university’s Coit College of Pharmacy PharmD program.

The questions came from a first-year PharmD course focused on disorders related to nonprescription medications for heartburn, diarrhea, atopic dermatitis, cold and allergies. The other class was a second-year course that covered cardiology, neurology and critical care topics.

To compare the exam performances of pharmacy students and ChatGPT, they calculated mean composite scores as a measure of the ability to correctly answer questions. For ChatGPT, they added individual scores for each exam and divided by the number of exams. To figure out the mean composite score for the students, they divided the sum of the mean class performance on each exam by the number of exams. The mean composite score for six exams for ChatGPT was 53 compared to 82 for pharmacy students.

Educators, clinicians and others continue to debate the value of AI large language models, such as ChatGPT, in academic medicine. While such models will continue to play a range of roles in health care, pharmacy practice and other areas, many are concerned that relying too much on the technology could hamper the development of needed reasoning and critical thinking skills in students.

Both Erstad and Edwards acknowledged that in time, newer and more advanced technology may change these results.

Experts

Brian Erstad, PharmD
Interim Dean, R. Ken Coit College of Pharmacy
Professor, Department of Pharmacy Practice and Science, Coit College of Pharmacy
Member, BIO5 Institute

Christopher Edwards, PharmD
Associate Clinical Professor, Department of Pharmacy Practice and Science, Coit College of Pharmacy
Director, Clinical Emergency Pharmacotherapy, Department of Emergency Medicine, College of Medicine – Tucson
Associate Clinical Professor, Department of Emergency Medicine, College of Medicine – Tucson

Bernadette Cornelison, PharmD
Associate Clinical Professor, Department of Pharmacy Practice and Science, Coit College of Pharmacy

Contact

Phil Villarreal
Office of Research and Partnerships
520-403-1986, pvillarreal@arizona.edu