ChatGPT-4 produces ‘near perfect’ pancreatic cancer radiology reports

Chat GPT-4 outperforms GPT-3.5 when it comes to creating structured, summarized radiology reports for pancreatic ductal adenocarcinoma (PDAC), researchers have found.

The study results are good news for both clinicians and patients, as the AI tool could improve surgical decision-making, noted a team led by Rajesh Bhayana, MD, of the University of Toronto in Canada in an article published June 18 in Radiology.

“[We found that] GPT-4 created near-perfect PDAC synoptic reports from original reports … [that] GPT-4 with chain-of-thought achieved high accuracy in categorizing resectability … [and that] surgeons were more accurate and efficient [when they used] AI-generated reports,” the group wrote.

Imaging is key to determining which pancreatic tumors are eligible for surgery and which are not, Bhayana and colleagues explained. But compared with free-text descriptions from imaging reports, “structured pancreatic CT reports improve communication between radiologists and surgeons and improve surgical planning and decision-making,” the team wrote, further noting that “radiologist adoption of structured reporting for pancreatic cancer is inconsistent, and resectability criteria are heterogeneously applied and tumor categorization is variably reported.”

To assess whether use of large language models (LLMs) could mitigate this inconsistency, the investigators compared GPT-3.5’s and Chat GPT-4’s ability to automatically create PDAC reports from original CT imaging reports. Their study included 180 consecutive PDAC staging CT reports from patients referred to Toronto’s Princess Margaret Cancer Centre from January to December 2018.

Two radiologists reviewed the PDAC reports and set a reference standard for 14 key features and for the National Comprehensive Cancer Network (NCCN) resectability category. (Key features included, among others, tumor location, tumor size, pancreatic duct, bile ducts, celiac artery, superior mesenteric artery, common hepatic artery, aorta, major veins, lymph nodes, and metastases.) The researchers then evaluated the performance of ChatGPT-3.5 and ChatGPT-4 for recall, precision, and F1 score (which indicates an average of precision and recall, with the best value equal to 1 and the worst to 0). Additionally, hepatopancreaticobiliary surgeons assessed both original and AI-generated reports to determine PDAC resectability, comparing accuracy and review time.

The group found that, compared with GPT-3.5, GPT-4 produced equal or higher F1 scores for all 14 extracted features, and for categorizing resectability, it outperformed GPT-3.5 for each prompting strategy (i.e., chain-of-thought, knowledge), with chain-of-thought prompting being most accurate. ChatGPT-4 reduced surgeons’ time spent on each report by 58%.

Bhayana’s team also reported the following:

Comparison of ChatGPT-3.5 to ChatGPT-4 for PDAC radiology
Measure	ChatGPT-3.5	ChatGPT-4
F1 score, creation of summary reports	0.97	0.99
Precision, identifying tumor location	99.4%	100%
Surgeon accuracy for categorizing resectability using AI reports compared with original reports	76%	83%

“Our study demonstrates a useful application of large language models (LLMs) in pancreatic cancer care that can increase standardization, improve communication, and enhance efficiency and quality of report review by surgeons,” the authors concluded.

The research supports “the sanguine view that AI, especially generative AI, will be an important enabler to achieve much-needed improvements in efficiency and value throughout the radiology workflow,” wrote Paul Chang, MD, of the University of Chicago School of Medicine, in a commentary that accompanied the study. But there’s more work to be done.

“A sobering reality must be acknowledged: there is … [a] gap between promising feasibility and providing operational solutions,” Chang noted. “For example, how can we best incorporate this promising AI-enabled capability into a scalable and comprehensive workflow orchestration? Such a solution would need to be able to generate the appropriate downstream product in a generalizable and contextually aware manner.”

The complete study can be found here.

Source: ChatGPT-4 produces ‘near perfect’ pancreatic cancer radiology reports

What's Hot

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

ChatGPT-4 produces ‘near perfect’ pancreatic cancer radiology reports

Upwork Inc. Reports Rising Freelance Demand for AI Proficiency and Human Oversight in

Freelancers face perfect storm that risks sector’s sustainability according to Bectu’s

3 Economic Reports That Could Affect Your Portfolio This Week, September 15-19, 2025 –

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Meet Casey Carroll | Yoga teacher, trauma-informed facilitator, freelancer, improv actor,

Best Freelance and Self-Employed Accounting Software

Taxes for freelancers and the self-employed in Switzerland in 2025

Do degrees still matter?

Affiliate

PhotonPay Brings Innovation to Affiliate World Asia with Industry-Specific Payment

GCU to play in WAC as men’s soccer affiliate – Grand Canyon University Athletics

Chevron plans to reduce 2025 capex

freelancer

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Marketing

Texarkana marketing agencies embrace AI | Texarkana Gazette

This week’s agency news, executive moves, and account changes

Washington, DC’s Destination Marketing Organization Elevates Leadership with New

Archives

Categories

What's Hot

ChatGPT-4 produces ‘near perfect’ pancreatic cancer radiology reports

Related Posts