Skip to Main Content

AI for Teaching and Learning in Higher Education: AI and Assessment Design

AI and Assessment Design: A Multi-layered Approach

The nearly ubiquitous access to artificial intelligence for text generation that gained momentum in 2023 has shaken up the education sector largely due to concerns about academic integrity. There were some who thought AI detectors were going to be the answer (spoiler alert: they're not and are biased against non-native English users) and others who thought banning AI was the best approach.

Approaches to mitigate AI misuse that include considering the shortcomings of AI are ultimately doomed to reducing an academic to a Sisyphusian game of whack-a-mole, as the technology and good practice in using it become more sophisticated. 

Mitigating AI misuse is a multi-layered approach that includes government, university administration, program-level, course-level and assessment-level considerations, as is illustrated in Professor Phil Dawson's "anti-cheating approaches tier table" below.  

Ranked from Most Effective at the top to Least Effective, Professor Dawson makes the case (Dawson, 2022) that it is not just ONE of these approaches that will mitigate the misuse of AI for academic dishonesty. Rather, it is an integrated, "Swiss cheese" model of risk management that involves multiple stakeholders and decision-makers and communication with students that will make the difference. 

  • Tier 'S' includes: 'Swiss cheese', 'Central teams', 'Amnesty/self-report', and 'Programmatic assessment'.
  • Tier 'A' features: 'Tasks students want to do', 'Vivas', 'Stylometry', 'Document properties', and 'Staff training'.
  • Tier 'B', there's: 'Learning outcomes', 'Proctoring/lockdown', 'Open book', 'Content-matching', 'Better exam design', and 'F2F exams'.
  • Tier 'C' has: 'Academic integrity modules', 'Honour codes', 'Reflective practice', and 'Legislation'.
  • Tier 'D' comprises of: 'Site blocking' and 'Authentic assessment'.
  • Finally, tier 'F' includes: 'Unsupervised MCQ', 'Bans, e.g., essays', and 'Reusing tasks'."

Assessment Design Considerations

TEQSA, the Tertiary Education Quality Standards Agency, has released the final version of a guidance document on assessment design in the age of AI, available for download from the TEQSA website.

Developed through a process of collaboration and consultation with the higher education community, the document emphasises the recommendations work best as an integrated whole that involves work at the qualification or program level and not just on a course or unit basis.

There are two principles that serve to underpin five ways in which assessment can be implemented to mitigate misuse of AI while leveraging the benefits AI can offer and helping students develop the skills to use AI tools effectively and ethically in their academic, professional and personal lives. Some of these are just good practice in assessment design that hopefully are already built into the way we assess students. Each comes with a rationale and, in the case of the implementation propositions, with scenarios of the principles being enacted in a course or program.

As illustrated above, the two principles are

Principle 1: Equip students to ethically use AI and understand its limitations and impacts

Principle 2: Provide multiple, inclusive, contextualised assessments for trustworthy judgments about student learning 

The five "propositions" about assessment include approaches academics can influence and approaches that need to be discussed at the program level. They state that assessment must emphasise:

  1. Appropriate, authentic engagement with AI
  2. A systemic approach to assessment aligned with discipline and qualification values
  3. The process of learning (rather than the output)
  4. Opportunities for students to work with AI and each other, and 
  5. Security at meaningful points across a program

So, how does one mitigate the use of AI for academic dishonesty at the course level? As per the Dawson chart, there are several implementation and design strategies as well as communication and pedagogical choices that can work together.

Set expectations

Firstly, being explicit about how AI CAN be used in your course and how students should not only cite its use but describe methodology will provide students with a baseline understanding of your expectations. Are students going to need AI skills for their future work? What skills will they need? How can you help them to be mindful, effective and ethical users of AI in their academic and professional lives? Some example course-level AI statements, thought-starters and rationales can be found on the Stanford Univeristy Teaching Commons AI course policy page.

For assessment-level expectation statements, consider the guidelines that academic journals set out in terms of citation and methodology statements related to AI. 

Consider the Learning Objectives

To what extent you allow its use will depend on your course. As with any course-level design, start with the learning objectives.

What are the learning objectives they need to achieve? Where are students in their studies? What discipline? What academic skills do they have and what do they need to develop? How are you going to provide students with opportunities throughout the learning modules, weeks and overall course to develop mastery of the knowledge and skills they'll need to achieve the objectives? 

Assess process, not just output

Consider what you are assessing in terms of learning. Are you assessing throughput - the process of learning? Or are you simply marking an end product? Your assessment marking guide or rubric should reflect what you think is important. We all know that students are strategic about how they spend their time and effort when undertaking study - as are we when undertaking our work. If you want students to develop critical thinking and evaluative judgment, is a report or an essay or a discussion forum the best possible vehicle?

Ask for multiple modes, including direct interactions

Could you provide students with choices - as per the Universal Design for Learning Framework  - that would allow them to express and enact their learning in modes that are meaningful and useful to them in their context? And consider if and how you could use techniques involving direct interaction with students, such as various oral assessments or scenario-based interactive oral assessments, as described in this Griffith University resource (PDF, 488 KB). Interactive orals have been scaled up to large cohorts by marking students as the assessment occurs, meaning that the time taken to mark is the same as that allotted to a text-based assessment.

Require referencing of your key points

Next, consider the evidence base you'll want them to use to support their thinking. Is it content you'll be supplying or is it content you want them to select or a combination? You can ask that their sources include your presentations, academic texts from the reading list you provide, and sources that represent diverse points of view such as those by Indigenous scholars and academics from other countries.

If much of this seems like a re-visit of the basic principles of good assessment design, it's because it is. Like AI that is so competent that people come to trust it too much (Dell'Acqua, et. al. 2023) - it is worth asking ourselves if plagiarism detection software has made us complacent in the assessments we rely upon. 

Test your assessment and marking guide

Where possible, ask a peer who is AI-savvy to test drive your assessment or test it yourself, using the prompt engineering techniques contained in this guide. Feed in your rubric and marking guide and ask the AI to provide feedback on how well your work achieved the stated expectations. Doing so may lead you to revise not only your assessment but your marking guide or rubric.  

In the section on AI basics, a simple definition of intelligence as the ability to solve complex problems was used to set the stage for understanding artificial intelligence. However, what about human intelligence? What do humans have that machines don't?

Researcher and educator Dr Rose Luckin's work on human intelligences considers what machines can help us understand assessment design that considers not just mastery of learning objectives, but the human skills that set us apart.

The video below sets out what Luckin has identified in her research as seven elements of human intelligence. It should start playing at minute 19:08 (if not, that is where the relevant portion begins) and continue watching until 23:48.

Video: AI systems for teachers & learners in school — Rose Luckin | Computing education research

Bringing in research from Korteling, et. al. (2018) and Luckin's work with Margaret Bearman (Bearman & Luckin, 2020), the five core human intelligences can be summarised as:

  • Metacognition – reflect on thinking, change mind as we grow
  • Metasubjectivity – reflect on our emotional IQ and that of others
  • Metacontextualism – apply existing knowledge to new situations, adapt, be resilient
  • Evaluative judgement – what “quality” looks like
  • Self-efficacy – what do I need to know/practice to do X

The book chapter authored by Luckin and Bearman in the book "Re-imagining University Assessment in a Digital World", lays out what the researchers feel are the differences between the way AI processes data and how humans think. It starts with a brief introduction to AI, where it is highlighted that there are different types of AI, some can learn and some only perform within the parameters of what they were programmed to do. The authors then lay out what they consider to be elements of human intelligence (drawing on the research of co-author Rose Luckin, the most important of which they consider to be evaluative judgment, the ability to define “quality” and the conscious development of a personal epistemology. 

They then provide an exemplar programmatic assessment strategy for first, second, and third-year students where the sophistication of the task increases to help students to develop the two aforementioned elements of intelligence. They then go on to describe the underlying pedagogical and learning design elements that support the assessment, including asking students to consider which elements of the assessment could have been completed successfully with AI and where the human element is more desirable.

The authors acknowledge that AI is better and faster at some tasks than humans but make the case that having a competent human in the loop and thinking of AI as a tool.

They finish by restating the complexity and nuanced nature of human intelligence and suggest that given the existence of AI not only for cheating but the need to learn how to use it means assessment needs to go beyond only assessing academic intelligence. This is an important book chapter for researchers, academics, and learning designers considering how to design both for and with AI as it not only considers how to prevent the misuse of AI, but how to consciously design for human intelligences. 

Once students have proven mastery of foundational skills and knowledge, is it necessary for them to continue to be marked on it? Or, are generative AIs, like calculators, here to save us on some of the tedious cognitive work involved in academic and professional pursuits - freeing us up for more creative, higher-order work that requires human intelligences?

In his look at cognitive offloading in the book Re-imagining University Assessment in a Digital World (Bearman et. al., 2020), Dr Phil Dawson argues that research on the benefits of cognitive offloading of lower-order tasks is limited in that it has largely tested short-term recall rather than complex tasks.  Therefore, while looking at cognitive offloading research can shed insights into why people choose to offload cognitively, it offers no guidance on how assessment designers could or should make that choice for others. 

In exploring cognitive offloading in assessment FOR learning, Dawson argues for programmatic assessment - writing that it could well be relevant to ask students to do tasks that could be offloaded (and expressly ban them from doing so) when first mastering a topic, then as they progress, allowing them to use tools for basic functions or tasks they've mastered in order to free them for more advanced learning. Like Luckin and Bearman (2020), Dawson feels evaluative judgement is a vital skill for students and expresses concern that if students offload this, they are not learning an essential skill.

When it comes to authentic assessment, he makes the point that if professionals in a student's chosen career are using offloading, then authentic assessment should include the use of these tools.

In a discussion of academic integrity, Dawson suggests that in addition to being explicit about what students are allowed to do, students should also be required to explain their methodology. He also poses the question (for the second time in the chapter) that as the AI industry has claimed that one of their benchmarks for an advanced AI is the ability to pass tertiary assessments (and, by extension, accomplish the objectives of this work), what do we need to assess students on?

Dawson's recommendations:

1) Specify learning outcomes and assessment with reference to the cognitive offloading allowed

2) Programmatically offload lower-order outcomes once mastered

3) Treat capability with cognitive offloading as an outcome

4) Extend the development of evaluative judgment to include the products of cognitive offloading and

5) Use cognitive offloading authentically.

Defining permitted AI use

What do you tell your students about the permitted use of AI in your course for formative and summative assessments?

The answer to this will likely vary, depending on what the learning objectives of the activities in question are and the reason behind your choice of the activity or assessment.

A group of academics (Perkins et. al., 2023) have developed The AI Assessment Scale, designed to be used by academics in their courses to help establish a mutual understanding of what AI use is permitted and what that looks like. Consult the original paper for comprehensive context  but the table below provides an overview:

Assessment Levels and Descriptions
Level Description
1. No AI Students rely solely on their knowledge, understanding, and skills. AI must not be used at any point during the assessment.
2. AI-Assisted Ideation and Structuring AI can be used for brainstorming, creating structures, and generating ideas for improving work. No AI content is allowed in the final submission.
3. AI-Assisted Editing AI can be used to edit for clarity of student-created work, but no new content can be created using AI. Your original work with no AI content must be provided in an appendix.
4. AI Task Completion, Human Evaluation AI used to complete certain elements of the task, with student discussion or commentary on the output. Requires critical engagement with AI-generated content and evaluating its output. You will use AI to complete specified tasks in your assessment.
5. Full AI AI as a ‘co-pilot’ to meet the requirements of the assessment. You may use AI throughout your assessment to support your own work and do not have to specify which content is AI-generated.