Good evidence confuses ChatGPT when used for health information

4 April 2024
News Release

A world-first study has found that when asked a health-related question, the more evidence given to ChatGPT the less reliable it becomes — reducing the accuracy of its responses to as low as 28 per cent.

As large language models (LLMs) like ChatGPT explode in popularity, they pose a potential risk to the growing number of people using online tools for key health information.

Scientists from CSIRO, Australia’s national science agency, and The University of Queensland (UQ) explored a hypothetical scenario of an average person (non-professional health consumer) asking ChatGPT if ‘X’ treatment has a positive effect on condition ‘Y’.

The 100 questions presented ranged from ‘Can zinc help treat the common cold?’ to ‘Will drinking vinegar dissolve a stuck fish bone?’

ChatGPT’s response was compared to the known correct response, or ‘ground truth’, based on existing medical knowledge.

CSIRO Principal Research Scientist and Associate Professor at UQ, Dr Bevan Koopman said even though the risks of searching for health information online are well documented, people continue to seek health information online, and increasingly via tools such as ChatGPT.

“The widespread popularity of using LLMs online for answers on people’s health is why we need continued research to inform the public about risks and to help them optimise the accuracy of their answers,” Dr Koopman said.

“While LLMs have the potential to greatly improve the way people access information, we need more research to understand where they are effective and where they are not.”

The study looked at two question formats. The first was a question only. The second was a question biased with supporting or contrary evidence.

Results revealed that ChatGPT was quite good at giving accurate answers in a question-only format, with an 80 per cent accuracy in this scenario.

However, when the language model was given an evidence-biased prompt, accuracy reduces to 63 per cent. Accuracy was reduced again to 28 per cent when an “unsure” answer was allowed. This finding is contrary to popular belief that prompting with evidence improves accuracy.

“We’re not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy,” Dr Koopman said.

ChatGPT launched on November 30, 2022, and has quickly become one of the most widely used large language models (LLMs). LLMs are a form of artificial intelligence that recognise, translate, summarise, predict, and generate text.

Study co-author UQ Professor Guido Zuccon, Director of AI for the Queensland Digital Health Centre (QDHeC) said major search engines are now integrating LLMs and search technologies in a process called Retrieval Augmented Generation.

“We demonstrate that the interaction between the LLM and the search component is still poorly understood and controllable, resulting in the generation of inaccurate health information,” said Professor Zuccon.

The study was recently presented at Empirical Methods in Natural Language Processing (EMNLP), a premier Natural Language Processing conference in the field.

Next steps for the research are to investigate how the public uses the health information generated by LLMs.

Disclaimer: CSIRO and The University of Queensland, being evidence-driven organisations will always advocate for health information to be evidence-based. Current LLM technology, while promising does not have a body of evidence to support their use in real health settings.

Source: Good evidence confuses ChatGPT when used for health information

What's Hot

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Good evidence confuses ChatGPT when used for health information

5 tips for maximizing AI as a freelance journalist

The free-for-all that's upending America's side hustle industry

Health insurance premiums are going up next year — unless you work at these companies

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Meet Casey Carroll | Yoga teacher, trauma-informed facilitator, freelancer, improv actor,

Best Freelance and Self-Employed Accounting Software

Taxes for freelancers and the self-employed in Switzerland in 2025

Do degrees still matter?

Affiliate

PhotonPay Brings Innovation to Affiliate World Asia with Industry-Specific Payment

GCU to play in WAC as men’s soccer affiliate – Grand Canyon University Athletics

Chevron plans to reduce 2025 capex

freelancer

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Marketing

Texarkana marketing agencies embrace AI | Texarkana Gazette

This week’s agency news, executive moves, and account changes

Washington, DC’s Destination Marketing Organization Elevates Leadership with New

Archives

Categories

What's Hot

Good evidence confuses ChatGPT when used for health information

Related Posts