AI Calling: ChatGPT Passes Mock Medical Exam, GPT-4 Outperforms Other Users, Reveals Latest Study

IdealSteel

Editor

posted on 2 years ago — updated on 1 second ago

151
views

The study showed that ChatGPT and GPT-4 achieved scores of 73.4% and 83.4%, respectively, relative to the user average of 73.7%. Questions were in single best answer, multiple-choice format

As the ChatGPT technology continues to expand, concerns that Artificial Intelligence (AI) could replace humans have started gaining ground. The latest study — which is a pre-print and yet to be peer-reviewed — strengthens such belief.

The study, posted on March 29, shows that ChatGPT-4 outperformed in neurosurgery exams by the American Board of Neurological Surgery and answered every question correctly.

While ChatGPT (GPT-3.5) has shown near-passing performance on medical student board examinations, the performance of ChatGPT or its successor GPT-4 on specialised exams significantly outperformed the former.

According to the study titled “Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations” on MedRxiv — the pre-print server for health sciences — the idea was to assess the performance of ChatGPT and GPT-4 on a 500-question mock neurosurgical written board examination.

In medical student board examinations, among 12 question categories, GPT-4 significantly outperformed users in each. It outperformed both users and ChatGPT for tumour questions.

ChatGPT was released in November last year and has sparked massive interest in the technology called generative artificial intelligence. This technology is used to produce answers mimicking human conversations.

Created by Microsoft-backed OpenAI, ChatGPT has been trained on enormous volumes of data, which makes the application competent in producing, summarising and translating text along with responding to inquiries and carrying out several other natural language tasks.

GPT-4 is the latest, next-generation AI language model that can read photos and explain what’s in them.

GPT-4 outperformed average user by scoring 83.4%

The study showed that ChatGPT (GPT-3.5) and GPT-4 achieved scores of 73.4 per cent and 83.4 per cent, respectively, relative to the user average of 73.7 per cent.

Question bank users and both GPTs exceeded last year’s passing threshold of 69 per cent. While scores between ChatGPT and question bank users were equivalent, GPT-4 outperformed both. The questions were in single best answer, multiple-choice format.

“Among 12 question categories, GPT-4 significantly outperformed users in each but performed comparably to ChatGPT in three (Functional, Other General, and Spine) and outperformed both users and ChatGPT for Tumour questions,” said the study.

Increased word count and higher-order problem-solving were associated with lower accuracy for ChatGPT, but not for GPT-4.

Multimodal input was not available at the time of this study so, on questions with image content, ChatGPT and GPT-4 answered 49.5 per cent and 56.8 per cent of questions correctly based upon contextual clues alone.

India working on ethical guidelines for AI in medical research

India’s apex medical research institution — Indian Council of Medical Research (ICMR) — is busy understanding the impact of AI-run applications like ChatGPT on health research and is already forming “ethical guidelines” for its use.

The team of officers has conducted a small test over ChatGPT to understand its immediate implications. They found that it is writing excellent stuff (in terms of writing research papers) but still needs human intervention.

While not everything is correct, the algorithm asks the user to rectify incorrect information as well which means that the programme is collecting the correct information and, one day, it will start throwing accurate results too.

Read all the Latest India News here