A real-world test of artificial intelligence infiltration of a university examinations system: a “Turing Test” case study

[thumbnail of Open Access]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.
| Preview
Available under license: Creative Commons Attribution
[thumbnail of SWCR_2024.pdf]
Text - Accepted Version
· Restricted to Repository staff only
Restricted to Repository staff only

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Scarfe, P. orcid id iconORCID: https://orcid.org/0000-0002-3587-6198, Watcham, K., Clarke, A. and Roesch, E. orcid id iconORCID: https://orcid.org/0000-0002-8913-4173 (2024) A real-world test of artificial intelligence infiltration of a university examinations system: a “Turing Test” case study. PLoS ONE, 19 (6). e0305354. ISSN 1932-6203 doi: 10.1371/journal.pone.0305354

Abstract/Summary

The recent rise in artificial intelligence systems, such as ChatGPT, poses a fundamental problem for the educational sector. In universities and schools, many forms of assessment, such as coursework, are completed without invigilation. Therefore, students could hand in work as their own which is in fact completed by AI. Since the COVID pandemic, the sector has additionally accelerated its reliance on unsupervised ‘take home exams’. If students cheat using AI and this is undetected, the integrity of the way in which students are assessed is threatened. We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.

Altmetric Badge

Item Type Article
URI https://reading-clone.eprints-hosting.org/id/eprint/116685
Identification Number/DOI 10.1371/journal.pone.0305354
Refereed Yes
Divisions Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology
Publisher Public Library of Science
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar