Professor develops AI essay grading program

October 1, 2018

Graphic by Ryan Magee | Mercury Staff.

For professors struggling to cope with stacks of papers to grade, new software — developed by a UTD researcher and powered by artificial intelligence — may offer a long-term solution.

Vincent Ng, a computer science professor who works with UTD’s Human Language Technology Research Institute, is developing an automated grading system for longform essays. Ng said the goal of the technology is to remove the need for human graders altogether.

“Essay grading is one of the very important applications of natural language processing,” Ng said. “For one, it has a lot of commercial value. Grading essays requires an enormous amount of human labor, and these are hours that can be spent elsewhere in the classroom.”

The software, he said, will read blocks of text and parse certain pieces of information. The parsing occurs on multiple levels. Lower levels might deal with spelling and grammar. Higher levels would evaluate coherence and overall organization, and even higher, the overall persuasiveness of an essay.

Story continues below advertisement

The Human Language Technology Research Institute consists of eight separate laboratories, each headed by a faculty member. The labs focus on different aspects of natural speech, including essay grading.

Luba Ketsler, a UTD economics professor, has a total of 449 students in her classes, in addition to a handful of research students. She said in her field it can be difficult to assess knowledge using only scantrons.

“I do have tests that are multiple choice, because I do have to control my workload somehow,” Ketsler said. “But I also want them to get some detail, some data.”

For Ketsler, that means written quizzes, regular writing assignments and three major research papers, each with a minimum of seven pages.

“That adds up quickly,” she said.

Ketsler, who has been at UTD for 11 years, said she’s witnessed a lot of growth, changing the way she has to approach teaching, including reducing the number of written assignments as her class sizes grew.

“The bigger the class size, the more disparity you’re going to see between each student’s knowledge,” she said.

Ketsler said more diverse classrooms can make for interesting discussions but creates a demand for grading that takes each student’s background into account. The objectivity of grading software powered by artificial intelligence is a big draw for Ketsler, who said she doesn’t like to rely on teaching assistants to grade everything.

“I don’t let the TAs grade the research papers. I let them do a lot of the technical work, like inputting grades and marking quizzes, because I want to stay consistent with the actual grading,” she said. “Everyone grades a little subjectively.”

Biology junior Sania Zeito, who recently transferred to UTD, said she felt there was a lack of objectivity when it came to grading standards at her previous university in Dubai.

“I sometimes felt like there was injustice, like maybe the grade depended on the mood of the grader,” Zeito said. “A robot would put a standard on the grading system.”

Artificial intelligence systems are modeled on the human brain. Ng said it is necessary to teach AI software how to grade by feeding it examples — in this case, essays graded by other humans. Feeding examples, however, can propagate error. If the data set contains biased grading, then AI incorporates the bias, too.

“You’re only as good as your data set,” Ng said.

Automated essay grading software has been employed at other institutions, such as Harvard University and the Massachusetts Institute of Technology, to grade student submissions in open-access online courses, which often have enrollments in the thousands.

While there aren’t any classes at UTD making use of the software yet, students might have had essays graded by computers well before they enrolled at UTD, as standardized exams such as the GRE and the TOEFL are scored by Criterion, an essay-grading software. Increasingly, the written portions of these standardized tests are making use of Criterion in preparation for an expected transition to fully automated grading.

Criminology freshman Giovanna Gonzalez said she is optimistic about the prospect of an automated grading system.

“It’s going to save a lot of time and money,” she said. “We’d also be able to get our grades faster.”

Gonzalez said she is concerned, however, that using automated grading software might mean that professors won’t have a complete picture of a student’s understanding of a concept.

“It would be better for the professors, maybe,” she said. “But I am not sure if it would be better for me as a student.”

Ng said the technology is still years away from grading higher-level assignments such as those in Ketsler’s classes, giving professors time to adjust.

“Professors will always find new ways to connect with their students as technology evolves,” Ng said.