“True reasoning” vs “memorisation” The recent release of the functional MATH() dataset came with a paper and a twitter thread that headlined with an impressive sounding claim: “More than 50% of the reported reasoning abilities of LLMs might not be true reasoning.”
Do Large Language Models have a "Reasoning Gap"?
Do Large Language Models have a "Reasoning…
Do Large Language Models have a "Reasoning Gap"?
“True reasoning” vs “memorisation” The recent release of the functional MATH() dataset came with a paper and a twitter thread that headlined with an impressive sounding claim: “More than 50% of the reported reasoning abilities of LLMs might not be true reasoning.”