How Google, Anthropic, and OpenAI build AI-fluent teams

TL;DR

Google DeepMind proved that AI-supervised human tutors outperform either humans or AI working alone, but only when humans focus on motivation and relationship.
Anthropic's research partner built a four-part framework showing AI fluency is a human competency, not a technical one.
OpenAI's education lead says the most common mistake academy operators make is avoidance, and the fix is simpler than most teams think.
All three companies agree: intentional design beats bolting AI on every time.

Three of the world's leading AI labs spent years researching how humans learn with AI. Here are their biggest findings, compressed into one framework your team can use today.

The personalized tutor arrived. Most organizations missed it.

For 12 years, the edtech sector had one holy grail: personalized tutoring at scale. The idea was simple enough. If every learner could get one-on-one guidance tailored to exactly where they were, outcomes would transform. The problem was cost. You could not put a human tutor next to every learner.

Then, almost overnight, you could.

Siya Purohit, OpenAI's education go-to-market lead, describes the moment clearly. She had spent over a decade watching edtech companies try to crack personalization. When she first used ChatGPT, she realized the industry had finally arrived at what it had been chasing.

"ChatGPT was it. The personalized tutor we had been chasing for 12 years. The question now is whether organizations are using it intentionally or just bolting it on."

Siya Purohit, OpenAI Education

The technology existed. The hard part turned out to be the humans. How they designed around it, how they led teams through it, and how they built fluency rather than just access.

That is the question three of the world's most advanced AI labs have been quietly working on. Their answers, drawn from years of learning science research, randomized controlled trials, and hundreds of conversations with educators and organizations globally, converge in ways that should reshape how your team approaches AI in 2026.

GOOGLE DEEPMIND

Google DeepMind spent years asking what actually makes a good tutor

Most AI tools built for learning were built by technologists starting with the technology. The Google DeepMind Learn LM team started somewhere different: with students, teachers, pedagogy experts, and institutions across the world.

Kaiz Alarakyia, senior product manager at Google DeepMind and head of AI for Education research, describes the process as a deliberate choice. His team could have taken the capabilities of large language models and forced them into an education solution. Instead, they took a step back and asked what behaviors actually produce learning outcomes.

The result was five principles that any AI tutor, and any learning program powered by AI, should embody.

The five principles every AI learning tool should be built on

Inspire active learning
Manage cognitive load
Deepen metacognition
Stimulate curiosity
Adapt to a learner's goals and needs

‍

These principles were not invented in a conference room. They emerged from semi-structured interviews and workshops with educators and students globally. And critically, Alarakyia's team found that none of them were present in existing AI models at the time.

The first principle, inspiring active learning, had the most immediate impact. Every large language model is trained to be helpful, which means answering questions fully and immediately. That is exactly wrong for learning. A student who drops their homework into a chatbot and receives the answer has not learned anything.

"We had to find ways to change the model's behavior so that instead of giving away the answer, it relied on different pedagogical moves that engaged the learner to take control of their own learning."

Kaiz Alarakyia, Google DeepMind

This required retraining the model's instinct to be immediately helpful and replacing it with the instinct to be pedagogically useful. Those are not the same thing.

Why education is harder than winning the Nobel Prize

The Google DeepMind team also built AlphaFold, the AI that cracked protein folding and won the Nobel Prize. They built AlphaGo, the first AI to beat a world champion at Go. Alarakyia says education is harder than either.

Here is why. AlphaGo learned by playing millions of games and receiving a clear win-or-lose signal after each one. The feedback loop was tight, unambiguous, and scalable. You cannot do that with learning.

In pedagogy, there are almost unlimited possible moves at any given moment in a learning conversation. You cannot simulate learners who pretend not to know something, because AI systems find it extremely difficult to authentically model ignorance and then have a genuine aha moment. And the signal for success in learning is not a single win-or-lose outcome. Sometimes motivation increasing without knowledge shifting is the right result. Sometimes the impact only shows up months later in an assessment.

The practical implication: AI in learning cannot operate in isolation. The feedback loop that makes AI powerful in other domains requires a human to close it.

THE EXPERIMENT THAT CHANGED EVERYTHING

The study that proved human-supervised AI beats both

The clearest evidence for this comes from a randomized controlled trial that Alarakyia's team ran in partnership with Eedi, a UK-based math education startup. It tested three different ways to help students who had just made a mistake.

Arm one: a static hint, which had been Eedi's original intervention. Arm two: a human tutor stepping in directly. Arm three: an AI tutor drafting response messages, with a human supervisor reviewing and editing before anything was sent to the student.

The results were not what anyone expected.

The AI pushed harder. The human held the relationship. Together they won.

Both the human tutor and the supervised AI tutor outperformed the static hint. That was expected. The surprise came when the team measured how students performed on the next problem of the same concept, the clearest signal of whether learning had actually transferred.

The supervised AI model outperformed the human tutor working alone.

The reason, when the team investigated, came down to metacognition. The AI consistently pushed students to articulate what they understood and where their thinking had gone wrong, rather than simply steering them toward the correct answer. Human tutors sometimes stopped short of that point when it looked like the student had figured it out. The AI kept going.

And the human supervisor still intervened in roughly 20 to 25 percent of cases. Almost none of those interventions were about factual errors. They were about relationship, about reading a student's emotional state, about knowing when encouragement mattered more than another pedagogical push.

"The human was responsible for motivation, accountability, and the relationship. The AI was more consistent on the pedagogical moves. The combination outperformed both on its own."

Kaiz Alarakyia, Google DeepMind

The model here matters for every academy operator. Human oversight of AI does not just add a safety check. It actively improves outcomes when humans focus their attention on what they do best.

ANTHROPIC

Anthropic built a framework so your whole team can talk about this

Most organizations adopting AI face the same problem: everyone is using it differently, no one is discussing it openly, and there is no shared language for evaluating whether they are doing it well. Professor Joseph Feller from University College Cork developed the four-D framework with Ringling College's Rick Dakin, and it was later used by Anthropic to build a series of open AI fluency courses.

The framework shifts the conversation away from the technology and back to the human. Rather than asking what AI can do, it asks what competencies humans need to work with AI effectively, ethically, and in ways that create genuine value.

Delegation: the question most teams skip entirely

The first D is delegation. Before a team does anything with AI, they need to answer a more fundamental question: should they use it at all, and if so, for what?

This sounds obvious. In practice, most organizations skip it. They adopt AI tools because competitors are adopting them, or because someone in leadership heard a statistic at a conference, and then figure out the use cases afterward. The result is shadow AI use, mismatched expectations, and tools that never get embedded into real workflows.

Feller frames delegation as matchmaking. You need an accurate model of what the technology is capable of and what it cannot do, and you need to match that honestly against the work you are trying to accomplish.

Description and discernment: the back-and-forth that separates fluent users from frustrated ones

The second and third Ds operate as a pair. Description is your ability to communicate with an AI system effectively. Discernment is your ability to evaluate what comes back.

Most conversations about AI fluency focus almost entirely on description, because prompting is visible and teachable. Discernment gets less attention, and Feller argues it may be more important. You can craft a technically excellent prompt and still fail to recognize when the output is subtly wrong, appropriately uncertain, or confidently hallucinated.

One of the most useful reframes in the framework is recognizing that when outputs disappoint, the problem is often not in the prompt. It may be a delegation problem: you are asking AI to do the wrong job. Or it may be a conceptual model problem: you are thinking about the system in a way that leads you to expect the wrong things from it.

Diligence: you are accountable for every output, whether AI made it or not

The fourth D is diligence. This is the ethical and responsibility layer: using AI transparently, understanding how it was built and what its externalities are, and owning the output you put into the world regardless of how it was produced.

AI-generated content that goes out without proper human review is a growing problem. Feller gives it a name, AI slop, and frames accountability as the antidote. The moment you publish, send, or present something, it is yours. The technology that helped you make it does not share the responsibility.

"Accountability is our agency. The automation narrative reduces a human being to the tasks they perform. The more interesting question is what AI frees us to do that is more profoundly human."

Professor Joseph Feller, University College Cork

The practical takeaway for organizations: AI fluency programs that focus only on prompting skills are teaching description while ignoring the other three Ds. The full framework requires intentional design across all four.

"The most important thing the framework does is put the people having the conversation back in the position of authority. We can get swept up in technological determinism. There is a lot we can and must do to shape this future."

Professor Joseph Feller, University College Cork

OPENAI

OpenAI's education lead says the biggest mistake is the simplest one

Siya Purohit speaks with hundreds of learning leaders and academy operators every year. When she is asked about the most common mistake organizations make when implementing AI, she does not point to technical errors or tool selection or vendor choice.

She points to avoidance.

The most common response to AI is feeling so overwhelmed by how fast it is moving that teams simply do not start. They wait for it to slow down, for best practices to solidify, for someone else to go first. In the meantime, their learners and staff find their own ways to use it, without structure, accountability, or organizational context.

"You do not need to become an expert in where AI is going. You need to learn how to use it to help make your work better. Start there, and expand from that foundation."

Siya Purohit, OpenAI Education

The progression Purohit recommends is straightforward. Start with your own daily work. Find the specific things that eat time or produce frustration and use AI to address those first. Once you have personal fluency, bring that into team-level workflows. Then, and only then, think about organizational-level transformation.

Start with the learner's goal, not the content

When it comes to academy operations specifically, Purohit is direct about where to put the first focus: personalization. Before you redesign curriculum, before you rebuild assessments, before you automate operations, give learners a way to articulate their goals and have AI help them work backward to a learning roadmap.

The technology for this already exists and is deployable today. A learner who starts a program with a clear two-year aspiration and receives a tailored path toward it is a fundamentally different learner than one who moves through a fixed sequence of modules. Engagement, completion, and outcome data all shift when learning feels relevant to where someone is actually trying to go.

The Wharton professor who stopped grading essays and started measuring prompts

One of the most striking examples of rethinking assessment in the AI era comes from Wharton, where professor Stefano Puntoni redesigned his MBA class around a question: what is the actual value of an essay assignment?

His conclusion was that the value was never the output. It was the thinking required to produce it. So he redesigned the assignment. Students now use ChatGPT EDU, and their grade is partly based on how many prompts it took them to reach an essay they were satisfied with.

Students who can clearly articulate what they want, who have precise enough thinking to communicate complex goals to an AI system, reach a strong result in two or three prompts. Students who have not yet developed that clarity iterate 19 or 20 times.

"The value of an essay was never in the output. It was in the critical thinking and communication skills that led to it. Now we can actually measure that process."

Professor Stefano Puntoni, Wharton (cited by Siya Purohit)

This example matters beyond assessment design. It points to what fluency actually looks like in practice. It is not knowing more about AI. It is thinking more precisely, communicating more clearly, and taking ownership of the quality of what you produce with it.

WHERE THEY ALL AGREE

Three companies, one conclusion

Google DeepMind, Anthropic, and OpenAI operate in different parts of the AI ecosystem. They have different research priorities, different products, and different organizational cultures. On the question of how humans build genuine fluency with AI in learning contexts, they land in remarkably similar places.

Intentionality is the through-line. Every one of these teams found that the organizations getting results with AI were the ones asking hard questions before deployment: who is this for, what behavior are we trying to change, how will we know if it works, and who is accountable when it does not.

The human-AI relationship matters more than the tool. Whether it is the supervised tutor model from DeepMind's research, the accountability layer in Anthropic's framework, or OpenAI's emphasis on educator presence and motivation, the pattern is consistent. AI performs better when a human is maintaining the relationship, reading the context, and making judgment calls the model cannot.

The engagement problem is a human problem, not a technology problem. DeepMind calls it the 5% problem: even excellent AI tools only move outcomes for the people who actually use them, and most people do not use them enough. Anthropic's research points to shadow AI adoption creating an accountability vacuum. OpenAI sees avoidance as the most common and most costly mistake. The tools are ready. The human infrastructure to support adoption usually is not.

And all three converge on the same deeper point: AI fluency begins with a mental model, not a skill set. Before you teach prompting, teach people what these systems actually are and what they are not. The teams that misuse AI most consistently are the ones treating a language model like a database or a search engine. Fix the mental model first.

Theme	Google DeepMind	Anthropic	OpenAI
Intentionality beats bolt-on	Pedagogical instruction-following over one-size-fits-all	Delegation as the first and most skipped step	Start purposefully; avoidance is the costliest mistake
Human + AI outperforms either alone	Supervised AI study: the combination beat both	Augmentation mode unlocks the most value	Educators freed for human work when AI handles busywork
The engagement problem is human	The 5% problem: tools only work if people use them	Shadow AI gap: adoption without accountability	Avoidance and overwhelm kill ROI before it starts
Process over output	Metacognition is the real learning win	Discernment: evaluate the process, not just the result	Prompt quality as the new measure of critical thinking
Mental model first	Learning is not information retrieval	LLMs are not databases	Know what AI is actually good at before deploying it

The principles are clear. The platform has to match.

At Disco, we built around the same convergence these three teams are describing. Transformational learning is social, experiential, and human-first. AI belongs in it as an amplifier, not a replacement, and the organizations that treat it that way are seeing measurably better outcomes.

That means programs built around real goals, not just content libraries. Learning that connects people to each other, not just to information. AI that handles operational busywork so your facilitators can do what only humans can: hold accountability, build relationships, and read the room.

The personalized tutor Siya Purohit spent 12 years chasing is here. The supervised-AI model Kaiz Alarakyia's team validated in randomized trials is deployable. The four-D framework Joseph Feller built with Anthropic gives organizations a shared language to start the conversation.

What your academy does with that is the question worth spending time on.

‍