A brilliant article with insights from @emilymbender, @sarahbmyers (@AINowInstitute), and more. But taking a step back:
As an NLP researcher, I'm asking what the freaking hell is anyone doing grading student essays with automated tools that I'd not trust on my academic datasets?
In 18 states "only a small percentage of students’ essays ... will be randomly selected for a human grader to double check the machine’s work".
In writing you're tasked with speaking to and convincing an audience through a complex, lossy, and fluid medium: language.
Guess what NLP is still bad at? Even if the marks aren't determining your life (!) the feedback you receive will be beyond useless. You're not having a conversation with a human. You're not convincing them. You're at best tricking a machine. A likely terribly ineffective machine.
Do you think that these systems from closed companies are equivalent in performance to the State of the Art in academia? Here's a hint: they definitely aren't. We know for certain the logic and reasoning of our existing SotA tools are unreliable in the best circumstances too.
Why do we think machines are ready to judge the words of any human, let alone a young student where the feedback will potentially shape their mind and their life? To intelligently deconstruct their writing and offer insight into how they can better themselves? To _judge_ them?
We've taken the already problematic concept of "teaching to the test" and elevated it to parody.
The test is free form text marked by a machine that can't read or write language with true logic or reasoning.
Write an essay that can trick this system into scoring you well.
This is our intellectual dystopia version of a Brave New World. We've replace reason with poorly approximated logic in the most dangerous of places. We'll only see these perverse interactions play out in the long span. A generation of students taught and judged by broken machine.
How about a sanity check?
Can the automated grading system even approximately answer the question it's grading?
We'd expect that from a human marker, right?
That doesn't guarantee it'll grade well but at least it's a first level sanity pass. This is not a "simple" question...
Maybe "more fair" - let's at least see how these grading systems perform on grading a selection of correct / incorrect answers to elementary and middle school questions from @allen_ai's ARISTO. I don't think you'll be shocked by the outcome ... -_-
allenai.org/aristo/
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
