Investigating Universal Adversarial Attacks Against Transformers-Based Automatic Essay Scoring Systems

Jan 1, 2025·

Igor Cataneo Silveira

André Barbosa

Daniel Silva Lopes Da Costa

Denis D. Mauá

· 0 min read

DOI

Abstract

Automatic Essay Scoring promises to scale up student feedback on written input, addressing the excessive cost and time demand associated with human grading. State-of-the-art automatic scorers are based on Transformers-based neural networks. While such models have shown impressive results in reasoning tasks, learned models often produce answers that arise from statistical clues in datasets and are misaligned with human objectives. Such systems are thus potentially fragile for scenarios where users are incentivized to deceive the system, as in a classroom setting. In this work, we evaluate the susceptibility of state-of-the-art automatic scorers to attacks made by non-expert users, such as students interacting with an automatic grader. We develop a methodology to simulate such student attacks and test them against scorers based on BERT, Phi-3 and Gemini models. Our findings suggest that (i) a BERT-based grader can be deceived using simple feature-based attacks; (ii) although Google’s Gemini has a solid agreement with graders, it can assign undeservedly high grades for small sentences; (iii) a Phi-3-based grader was less susceptible than BERT, but it still assigned relatively high grades to some of our attacks.

Type

Conference paper

Publication

Intelligent Systems

Last updated on Jan 1, 2025

Authors

Denis D. Mauá (he/him)

Associate Professor

← Dealing with cycles in graph-based probabilistic models: the case of Logical Credal Networks Jul 1, 2025

A Big Challenge: Tools to Guarantee Robust and Controlled Behavior of Large Language Models Nov 1, 2024 →