The problem of AI and Assessment

Assessments play a critical role in evaluating and measuring students’ knowledge and competencies in a subject area and determining how well they have achieved the intended learning outcomes for the subject. Students typically demonstrate this achievement by producing an artefact of some kind (commonly an essay, report, examination etc.) which is assessed or graded.

Because most traditional university assessment artefacts are written documents, the widespread availability of text-generating AI tools such as ChatGPT poses a significant threat to their integrity. Put simply; it is increasingly difficult to determine whether an artefact was created by the student or by AI. This raises a troubling question; how can we be sure that our graduates have learned what they need to be safe and competent professionals?

Prominent educational technology companies such as Turnitin have responded by releasing software that may help educators to detect AI-generated work. The makers of some generative AI platforms have also promised to embed invisible digital ‘watermarks’ to AI-generated text and media in the future. Overall, however, the phenomenal pace of innovation and progress in generative AI suggests that electronic means of detecting AI with sufficient reliability to support prosecution of academic misconduct cases are not on the near horizon and indeed may never eventuate.

A more promising approach is to consider whether existing assessment regimes are still ‘fit for purpose’ or might be less vulnerable to AI if they were redesigned. One proposal is that we should revert to traditional hand-written, closed-book invigilated examinations. While this may seem like an obvious solution to minimise the risk of misuse of AI, such assessments have well-documented drawbacks in terms of student learning and engagement. If we prioritise assessment security at the expense of alignment, authenticity, equity and wellbeing, we risk compromising assessment in important ways that disadvantage most students, and in ways that are inconsistent with the ambitions of the University’s Advancing Students and Education strategy.

In this downloadable guide and on this webpage, we offer suggestions for how subject assessment regimes can be redesigned to reduce the risk of AI misuse without resorting to heavily weighted, closed-book end-of-semester invigilated exams (with their associated pedagogical drawbacks). De-emphasising high-stakes examinations allows for the introduction of more diverse, potentially more authentic and lower-weighted assessment tasks. These often provide students with better opportunities to learn and improve through feedback, which may improve students’ perceptions of their value.

Many of the strategies we propose are likely to be effective because they reduce students’ motivation to cheat – whether by reframing assessment as a helpful tool, in addition to a hurdle to be overcome (assessment for learning, not only an assessment of learning), by diversifying the nature of the artefacts we assess or by auditing workflows and thinking processes that are uniquely human and thus difficult to replicate by AI.

Redesigning assessments is not without its own challenges, especially in relation to scalability workload, and resourcing. We provide examples of case studies where subjects have implemented one or more of these strategies.