ChatGPT in Education
An Effect in Search of a Cause?
Since its release in late 2022, ChatGPT has dazzled students and educators alike. From generating essays to solving equations, from sparring partner to teacher, from corrector to essay writer it feels like the perfect classroom companion. Unsurprisingly, the educational research community has jumped in, eager to measure its impact. Some early studies report dramatic gains—leading to meta-analyses that suggest ChatGPT significantly improves learning.
But can we trust these claims?
In our recent article published in the Journal of Computer Assisted Learning ‘ChatGPT in Education: An Effect in Search of a Cause’, we (Joshua Weidlich, Dragan Gašević, Hendrik Drachsler and I) argue: not yet. And to understand why, we need to look not forward, but back—specifically, to a critique from over 40 years ago.
Déjà Vu: Richard Clark and the Grocery Truck Metaphor
In 1983, Richard Clark famously argued that media do not influence learning outcomes but rather the method (pedagogy) used. In an article that sparked decades of debate (see the Kozma-Clark / Media-Methods debate for example (1994)), he likened media to grocery delivery trucks: they transport instructional content, but they don’t change its nutritional value. Or in his own words: “media are mere vehicles that deliver instruction but do not influence student achievement any more than the truck that delivers our groceries causes changes in our nutrition”. In Clark’s view, learning outcomes are determined not by the medium (TV, computers, or ChatGPT), but by the instructional method—the pedagogy.
He warned researchers against confounding the two. When a new medium enters the classroom, it often arrives bundled with a new way of teaching. If that new “media-method package” produces better outcomes, we can’t know which part caused the effect unless we carefully isolate variables. Clark's central message? If you want to test a medium, you must keep the method constant.
Fast-forward to today, and we’re making the same mistake—this time with ChatGPT.
ChatGPT ≠ Pedagogy
ChatGPT is a tool[1], nothing more and nothing less. It has no built-in instructional goals, no learning theory, and no pedagogical intent. Compare it to a chain saw that can be used to cut down or prune a tree, make a beautiful ice sculpture, or horribly kill and dismember someone (i.e., the Texas Chainsaw Massacre). Whether it helps or harms learning depends entirely on how it’s used. Yet many studies—and especially recent meta-analyses—treat “using ChatGPT” as if it were a singular, well-defined instructional intervention while in reality, it’s being used in wildly different ways, not the least of which as a shortcut to avoid thinking.
And the studies comparing all of the uses often lump them together under the same label: With GPT. This is exactly what Clark warned about: confounding the medium (i.e., ChatGPT) with the method (i.e., how it’s used pedagogically).
To avoid this confusion—and to actually interpret what these studies may be showing - we outline in our article three basic requirements for any study claiming that ChatGPT improves learning:
There must be a clear definition of the treatment: What exactly did the students do with ChatGPT? Was it guided? Structured? Left open-ended?
There must be a meaningful control group: What was ChatGPT compared to? Traditional teaching? No teaching? Human tutoring?
Valid learning outcomes must be measured: Did the outcomes reflect actual learning (i.e., durable changes in knowledge or skill), or were they short-term performance metrics or self-reports?
When we audited 19 studies on ChatGPT and academic performance from Deng et al.’s 2025 meta-analysis, only 4 met all three criteria. That means most of the research claiming to show ChatGPT’s effectiveness (15 of the 19) lack the necessary structure to draw meaningful conclusions—just as Clark predicted would happen with poorly designed media studies.
Intermezzo: Tennyson’s "Big Wrench"
In 1994, Robert Tennyson (my predecessor as editor of Computers in Human Behavior) reflecting on the Clark/Kozma debate, warned against what he called the “big wrench” view of technology: assuming a new tool can fix every educational problem. But if we treat ChatGPT as a magical fix without understanding what it's actually doing pedagogically, we risk repeating history: making bold claims based on weak, confounded evidence.
Indeed, some of the most widely cited studies in Deng et al.’s meta-analysis turn out to be measuring not learning after using ChatGPT, but performance during its use (e.g., co-produced text quality). In Clark’s or Gavriel Salomon’s (1990) terms, these are effects with the medium, not effects of the medium.
Take a lesson from ITS research
To illustrate how this can be done properly, we contrasted the emerging ChatGPT literature with research on Intelligent Tutoring Systems (ITS). These systems—such as Cognitive Tutors—have been studied for decades and are explicitly designed to support learning, often grounded in cognitive theory and instructional design principles. Meta-analyses of ITS consistently report moderate to large effect sizes, and they do so based on clearly defined treatments, control conditions, and learning outcomes. In these studies, the instructional method is embedded in the technology, allowing for meaningful conclusions about its educational effectiveness.
ChatGPT, by contrast, is a general-purpose tool. It is not designed for education and lacks any built-in instructional structure. Its impact on learning depends entirely on how it is used. When integrated into well-designed pedagogical approaches—such as structured dialogue or scaffolded feedback—it may support learning. But when used as a shortcut or a substitute for thought, it may do more harm than good. Some studies already suggest that students may cognitively offload tasks to ChatGPT, bypassing the mental effort that leads to learning. This echoes Gavriel Salomon’s distinction between the “effects with” and “effects of” media. Just because students perform well while using a tool does not mean they have learned anything from it.
Understandably, researchers are eager to explore ChatGPT. The stakes are high both in society but also in academia (publish or perish!) , and the technology is evolving quickly. But science must resist the urge to move at the speed of hype. As we conclude in our article: fast science creates research waste, clouds understanding, and risks misinforming policy and practice. Before we promote ChatGPT as a learning revolution, we need studies that clearly define what’s being tested, how, and why.
In Clark’s words: “We too often act as if we believe that each delivery technology requires a new theory of learning and performance.” We don't. What we need are thoughtful applications of existing theories to new technologies—and rigorous evaluation grounded in clear definitions and comparisons.
ChatGPT may or may not transform education. It may lead to more effective, efficient, and enjoyable/satisfactory learning or may be the newest hype to do nothing or even worse – and this is my opinion – do more harm than good. But the current evidence doesn’t tell us that—not yet. What it tells us is that researchers are once again mistaking the truck for the groceries.
In short, the question “Does ChatGPT enhance learning?” is currently unanswerable—not because the technology lacks potential, but because the research designs being used are not up to the task. Until studies begin to treat ChatGPT not as an independent cause of learning, but as a medium embedded within specific instructional methods, we will remain in a state of confusion.
Let’s not repeat the mistakes of past media research. Instead, let’s ask better questions, design smarter studies, and build a stronger foundation before we leap to conclusions. Because before we interpret the effect, we need to understand the cause.
Clark, R. C. (1983). Reconsidering research on learning from media. Review of Educational Research, 53, 445–459.
Clark, R. E. (1994). Media will never influence learning. Educational Technology Research and Development, 42(2), 21-29. https://doi.org/10.1007/BF02299088
Kozma, R. B. (1994). Will media influence learning? Reframing the debate. Educational Technology Research and Development, 42(2), 7-19. https://doi.org/10.1007/BF02299087
SALOMON, G. (1990). Cognitive effects with and of computer technology. Communication Research, 17(1), 26–44. https://doi.org/10.1177/009365090017001002.
Tennyson, R. D. (1994). The big wrench vs. integrated approaches: The great media debate. Educational Technology Research and Development (ETR&D), 42, 15–28.
Weidlich, J., Gašević, D., Drachsler, H., & Kirschner, P. (2025). ChatGPT in education: An effect in search of a cause. Journal of Computer Assisted Learning, 41:e70105. https://doi.org/10.1111/jcal.70105
[1] ChatGPT is a large language model (LLM): an advanced type of AI trained on vast amounts of text data to ‘understand’ and generate human-like language. It can answer student questions, write text, translate text to different languages, generate practice exercises, and perform other language-related tasks based on patterns it has learned.



I enjoyed reading this and deeply resonate with the idea that we need to resist moving at the speed of hype with tools like ChatGPT. Clearly even at a consumer level, there are incredible variations of how people engage with LLMs like ChatGPT, some using it at a surface level for output generation and others deeply opinionated, and conversing in a more intentional and ongoing exchange. Unfortunately, unless you’re savvy enough, I believe the majority of consumers are using ChatGPT in the more shallow former way. To your point, ChatGPT is merely a tool that lacks any really framework to guide educators and teachers behind a unified goal, or pedagogical intent which can also create some amount of decision paralysis and distraction around how to actually use it effectively. Because of this I suspect more specific, monolithic, purpose-built AI tools for education may prove to be more interesting - what are your thoughts on this?
That was good timing. I am currently writing a book with Hilary Burkard of Sound Foundations. It’s provisionally entitled ‘How to Teach Spelling, so Children Really Learn’. I was just struggling to expand the chapter ‘Will Apps and AI come to the Rescue?’ (Answer, maybe but not any time soon), when your piece came through.
Am I OK to briefly summarise / quote this and post links to it elsewhere?