Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

misk@piefed.social · 3 days ago

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

ageedizzle@piefed.ca · edit-2 3 days ago

LLMs now achieve nearly perfect scores on medical licensing exams, but this does not necessarily translate to accurate performance in real-world settings

This is an interesting distinction. Intuitively it feels like something similar is going on with programming. Gemini is apparently passing all these crazy benchmarks but I couldn’t even get it to one-shot a game of snake in C