- Updated: March 18, 2026
- 6 min read
Google DeepMind Unveils New Framework to Measure AGI Progress – UBOS News Insight
Google DeepMind has unveiled a scientifically‑grounded cognitive framework that quantifies progress toward artificial general intelligence (AGI) and paired it with a $200,000 Kaggle hackathon to crowdsource robust evaluation tools.
DeepMind’s New AGI Measurement Framework: Ten Cognitive Abilities, a Three‑Stage Protocol, and a $200K Kaggle Hackathon
The race to build machines that can think, learn, and adapt like humans has long been hampered by a lack of clear metrics. DeepMind’s latest paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” fills that gap by translating decades of cognitive science into a practical AGI measurement toolkit. The initiative not only defines ten core abilities but also launches a three‑stage evaluation protocol and a high‑stakes Kaggle competition, inviting the global research community to co‑create the benchmarks that will shape the next decade of AI.
1. DeepMind’s Cognitive Taxonomy: Ten Abilities That Define General Intelligence
Drawing from psychology, neuroscience, and AI research, DeepMind identified a set of ten cognitive abilities that together form a comprehensive picture of general intelligence. Each ability is described in operational terms that can be translated into concrete tasks for AI systems.
- Perception: Extracting and interpreting raw sensory data (e.g., vision, audio).
- Generation: Producing coherent outputs such as text, speech, or actions.
- Attention: Dynamically allocating processing resources to relevant stimuli.
- Learning: Acquiring new knowledge from experience or instruction.
- Memory: Storing, retrieving, and updating information over time.
- Reasoning: Drawing logical inferences and solving abstract problems.
- Metacognition: Monitoring and reflecting on one’s own cognitive processes.
- Executive Functions: Planning, inhibiting irrelevant actions, and switching tasks.
- Problem Solving: Generating effective solutions for domain‑specific challenges.
- Social Cognition: Understanding and responding appropriately to social cues.
These abilities are deliberately broad, ensuring that future AI systems—whether language models, multimodal agents, or embodied robots—can be evaluated on a common scientific footing.
2. The Three‑Stage Evaluation Protocol: From Benchmarks to Human Baselines
DeepMind’s framework proposes a rigorous, reproducible pipeline that maps AI performance onto human capability distributions. The protocol consists of three sequential stages:
- Broad Suite Evaluation: AI models are tested across a diverse set of tasks, each targeting one of the ten cognitive abilities. Test sets are held‑out to prevent overfitting.
- Human Baseline Collection: A demographically representative sample of adults completes the same tasks, establishing a performance distribution for each ability.
- Relative Mapping: Model scores are plotted against the human distribution, yielding a clear visual of where the AI stands—below average, average, or superhuman.
By anchoring AI results to human baselines, the protocol offers an intuitive “intelligence meter” that can be communicated to both technical and non‑technical stakeholders.
3. Kaggle Hackathon: Building the Missing Evaluations
To turn theory into practice, DeepMind partnered with Kaggle to launch the Measuring Progress Toward AGI: Cognitive Abilities hackathon. The competition focuses on the five abilities with the largest evaluation gaps: Learning, Metacognition, Attention, Executive Functions, and Social Cognition.
| Track | Prize (USD) | Goal |
|---|---|---|
| Learning | $10,000 (top 2) | Create a benchmark that measures rapid adaptation to novel data. |
| Metacognition | $10,000 (top 2) | Design tasks that require models to self‑evaluate confidence. |
| Attention | $10,000 (top 2) | Develop datasets that test selective focus under distraction. |
| Executive Functions | $10,000 (top 2) | Build scenarios requiring planning, inhibition, and task‑switching. |
| Social Cognition | $10,000 (top 2) | Create interactive dialogues that probe theory‑of‑mind abilities. |
| Grand Prizes: $25,000 each for the four best overall submissions (announced June 1) | ||
The hackathon runs from March 17 to April 16, with submissions evaluated on the newly released Kaggle Community Benchmarks platform. Participants can test their metrics against frontier models such as Gemini, Claude, and GPT‑4, ensuring that the resulting benchmarks are future‑proof.
4. Why This Matters for Researchers, Businesses, and UBOS
For the research community, the taxonomy offers a shared language that can replace the current patchwork of ad‑hoc benchmarks. By aligning AI progress with human performance curves, scientists can more precisely gauge when a model truly exhibits “general” capabilities.
For enterprises, the framework translates abstract AI hype into concrete risk‑/benefit analyses. Companies can now ask, “Does this model meet human‑level reasoning for my domain?” and receive a data‑driven answer.
From UBOS’s standpoint, the emergence of a standardized AGI measurement system unlocks new opportunities for our UBOS platform overview. Our low‑code Web app editor on UBOS can now embed the cognitive benchmarks as reusable components, allowing developers to instantly evaluate their AI‑powered applications against the DeepMind standards.
Moreover, the AI research hub at UBOS is already prototyping AI marketing agents that leverage the new metrics to optimize campaign performance in real time. By integrating the three‑stage protocol into our Workflow automation studio, businesses can automate the validation of AI models before deployment, reducing costly missteps.
Startups and SMBs, in particular, stand to gain from the UBOS for startups and UBOS solutions for SMBs programs, which now include a “benchmark‑as‑a‑service” offering. This service packages the DeepMind evaluation suite into a plug‑and‑play API, letting smaller teams compete with tech giants on a level playing field.
5. Take the Next Step: Leverage DeepMind’s Framework with UBOS
Ready to future‑proof your AI initiatives? Here’s how you can get started today:
- Explore the Enterprise AI platform by UBOS to integrate the cognitive taxonomy into your existing data pipelines.
- Browse the UBOS templates for quick start and spin up a benchmark‑driven prototype in minutes.
- Visit the UBOS portfolio examples to see how leading brands are already measuring AI performance.
- Join the UBOS partner program to collaborate on open‑source evaluation tools and gain early access to new benchmark releases.
- Check our UBOS pricing plans for flexible, usage‑based pricing that scales with your AI workload.
Whether you’re a researcher aiming to publish state‑of‑the‑art results, a product manager seeking trustworthy AI metrics, or a founder looking to differentiate your SaaS offering, DeepMind’s framework combined with UBOS’s low‑code ecosystem gives you a competitive edge.
6. Closing Thoughts
The introduction of a rigorous, human‑anchored AGI measurement system marks a watershed moment for the field. By providing a clear set of ten cognitive abilities, a reproducible three‑stage protocol, and a $200,000 incentive for the community, DeepMind is turning the abstract goal of AGI into a quantifiable engineering challenge.
At UBOS, we see this as an invitation to embed scientific rigor into every AI‑driven product we help our customers build. From rapid prototyping with the AI research hub to enterprise‑grade deployments on the Enterprise AI platform by UBOS, the future of trustworthy, measurable AI is already within reach.
Stay tuned to our newsroom for updates on how we’re integrating DeepMind’s benchmarks, and consider joining the Kaggle hackathon to help shape the next generation of AGI evaluation tools.
Keywords: AGI measurement, DeepMind cognitive framework, AI progress metrics, Kaggle AGI hackathon, artificial general intelligence, AI evaluation, ubos.tech AI news