Two children playing the AI Gorilla game

SHERI HALL

Promoting Critical Thinking in AI Education Models

Artificial intelligence models developed and evaluated by faculty throughout Carnegie Mellon’s School of Computer Science are transforming education — from preschool through the college level — using novel approaches that teach critical thinking skills and improve equity for all students.

The idea is to shift students’ focus away from lower-level skills, such as basic math calculations or identifying patterns, and toward higher-level skills including problem-solving, decision-making and evaluation — all while providing broader access to AI tools that can help level the playing field among students.

Ken Koedinger, Hillman Professor and METALS Program and LearnLab Director in the HCII

“When students have new computing capabilities available to them, instructors can spend less time teaching them what the technology can do,” explained Ken Koedinger, Hillman Professor of Human-Computer Interaction and Psychology at Carnegie Mellon University, who leads several education-based projects. “This opens a whole new world of possibilities for how and what students can learn.

“Think about when calculators first came out,” said Koedinger. “Teachers spent less time teaching how to calculate a square root, and more time on higher-level concepts, like when it is appropriate to use the square root.”

Examples abound through the School of Computer Science and CMU. For starters, Koedinger directs LearnLab, a center funded by the National Science Foundation that conducts research and promotes evidence-based tools that improve learning. A generative AI model called HypoCompass helps computer science students learn how to find and repair bugs in computer programs. A mixed-reality learning platform NoRILLA (norilla.org) guides preschool and elementary-aged students through science experiments.

And the Eberly Center for Teaching Excellence and Educational Innovation is sponsoring dozens of research projects to support and measure the use of generative AI tools in CMU classes through its Generative Artificial Intelligence Teaching as Research (GAITAR) Initiative.

“We are just beginning to understand how generative AI tools can enhance learning,” said Hoda Heidari, the K&L Gates Career Development Assistant Professor in Ethics and Computational Technologies. Heidari’s research evaluates large language models (LLMs) to identify errors and flaws, and attempts to fix them. “There are numerous different use cases in education. Students can use generative AI to research a topic, to write an essay or as part of the scientific discovery process.”

Hoda Heidari, K&L Gates Career Development Assistant Professor in Ethics and Computational Technologies in the HCII and S3D

Teaching STEM Concepts Using and AI Gorilla

A shining example of artificial intelligence making a significant difference in education is NoRILLA, a mixed-reality learning platform that combines physical science experiments with feedback from a virtual gorilla to teach children the scientific method.

NoRILLA’s AI-generated gorilla leads kids through hands-on physics experiments that use items like blocks or ramps. The tool prompts students to make predictions, then explains experiment results and provides feedback. To accomplish this, the technology uses computer vision to understand what the students are doing in the physical world.

Students of various ages play and engage with the NoRILLA system under the watch of their teachers.

Nesra Yannier, a senior systems scientist in the Human-Computer Interaction Institute, created NoRILLA to encourage more kids to stay engaged in science and math.

“I grew up in Turkey where the education system was based mostly on memorization,” said Yannier. “I was a curious child asking a lot of questions and didn’t always get the answers I was looking for. I always believed that the approach to learning should be different, fostering curiosity and helping children understand the reasons why rather than memorizing facts.”

Yannier’s curiosity encouraged her to pursue advanced education in computer science and physics, and then a career where she could help young students stay engaged in STEM. She came up with NoRILLA to build a connection between technology and hands-on learning.

“Even though there are a lot of technologies out there, most are isolating kids from their physical environments,” Yannier explained. “I wanted to find a way to bring together the advantages of the physical world and advanced technologies.”

Nesra Yannier, Senior Systems Scientist in the HCII

NoRILLA has found educational applications in schools, museum exhibits and afterschool programs — including those for disadvantaged students — reaching millions of children across the U.S. It was chosen as one of six exhibits to showcase the benefits of AI at CaixaForum Valencia, a museum in Spain.

The data show that NoRILLA is working.

Research in four Pennsylvania school districts found that children using NoRILLA doubled their understanding of STEM concepts in one year. Looking at students across different grade levels, a kindergartner interacting with NoRILLA for three months significantly exceeded the scientific understanding of a second grader without NoRILLA.

At museums, NoRILLA improved children’s STEM learning by five times compared to other screen-based technologies, and children voluntarily spent four times as long at NoRILLA exhibits compared to time spent at other hands-on learning exhibits.

Anecdotally, Yannier has received feedback that NoRILLA especially helps elementary school teachers who don’t have science backgrounds.

After using the system one teacher said, “I’m not a scientist by any stretch of imagination and I love science, and I love to teach science, but I feel like I’m bound by my own limitations in the science world. To have something like this that supports and lets the kids and myself all learn together is genius!”

Generative AI in CMU Classrooms

While young students are learning science and math with NoRILLA, professors across CMU have introduced AI tools into their classrooms. Many of the innovations are funded by GAITAR — a university-wide project to cultivate, support and measure the use of AI in CMU classrooms.

Derek Leben, associate professor in the Tepper School of Business, is conducting a GAITAR-sponsored experiment to measure whether engaging in debate with an LLM such as ChatGPT helps students to develop better analytical reasoning skills.

Marti Louw, director of the Learning Media Design Center and faculty member in the HCII, asks students to incorporate feedback from generative AI tools into interview protocols they have written in her Learning Media Design class. The students also engage in role-playing with the AI tool to practice their interviews. The goal of the experiment is two-fold: Louw measures whether these interactions improve the students’ protocols and whether interviewing the AI model improves their interviewing skills.

CMU’s Qatar campus also encourages their students to use generative AI tools such as Copilot and ChatGPT for lab assignments and class projects, and then the instructors measure whether doing so improves the quality of their work. Their question is fundamentally one of equity: Can these tools help students who have gaps in knowledge to catch up to their more experienced classmates?

Heidari cautioned that across all of these classes and applications, it remains important to teach students that these models are not completely accurate.

Derek Leben, Associate Professor in the Tepper School of Business

Marti Louw, Director of the Learning Media Design Center and Faculty Member in the HCII

“AI models can create hallucinations, which are plausible-sounding but non-existing citations,” said Heidari. “They may produce arguments that don’t make sense. They basically lack common sense, so students must learn how to use them, but also to verify the output.”

This is where those essential critical thinking skills come in, Koedinger explained. Thus, teachers will need to restructure their lessons and assessments to accommodate generative AI. “It does take some generative work on the part of instructors and assessment writers,” he said. “They need to create assessments that measure students’ critical thinking skills — the skills that these tools encourage students to develop.” ■

Award-Winning App Uses Generative AI to Teach Debugging

In computer science education, learning how to write code is an essential skill, but even more important is learning how to figure out what’s wrong when a computer program doesn’t work.

Qianou (Christina) Ma, a Ph.D. student in the HCII

Professor of Computer Science Sherry Tongshuang Wu

Qianou (Christina) Ma, a Ph.D. student in the HCII created a web-based tool called HypoCompass that uses generative AI to teach students how to debug computer code.

The project is especially interesting considering that generative AI is increasingly used to write code and is also known to make errors. Her write-up on the project won the best paper award at the 2024 International Conference on Artificial Intelligence in Education.

“There are a lot of ways to teach debugging, but our goal was to create a deliberate practice teaching higher-level skills, such as bug finding, and leaving lower-level tasks, such as code writing and fixing, to the LLM,” Ma said.

Ma developed the idea for HypoCompass after brainstorming with her doctoral advisors, Hillman Professor Ken Koedinger and Assistant Professor of Computer Science Sherry Tongshuang Wu, about ways that generative AI could benefit students.

“How do you help students get better at testing their programs that their ‘AI partner’ might have created and debugging them when they don’t pass the test?” Koedinger said. “Christina built an online system to support learning those higher-level skills where the student’s partner is actually ChatGPT.”

Learning how to evaluate and correct AI-generated content is an essential skill for the next generation of computer scientists, Ma said.

“LLMs are not going to be perfect — we shouldn’t expect them to be,” she said. “But if we teach critical thinking skills, we can still ensure learning, even though we are dealing with imperfect models. And we tried to make it fun!”

The fun comes from role-playing. HypoCompass asks students to play the role of teaching assistants while the LLM acts as a novice student asking for help; the LLM then makes mistakes in coding that the actual students must find.

The tool generates hints to help students identify specific problems in the faulty code. And it can provide explanations of the bugs in the code as instantaneous feedback.

Ma tested the tool among freshman and sophomore computer science students and determined that it effectively taught debugging skills that are costly to cover in a traditional classroom. HypoCompass provides an efficient alternative in teaching debugging, because it can provide personalized support to many students simultaneously.

“It’s famously hard to teach and evaluate debugging,” Ma said. “We developed this tool to make it easier to deliver debugging instruction and allow students to practice on their own.” ■

Promoting Critical Thinking in AI Education Models

Teaching STEM Concepts Using and AI Gorilla

Generative AI in CMU Classrooms

Award-Winning App Uses Generative AI to Teach Debugging

More from the Spring 2025 LINK Issue

With StepUp, ETC Students Are Transforming Global Hygiene Education

Sail() Platform Revolutionizes Tech Education in Community Colleges and Beyond