Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Smerity

@Smerity

Aug 21, 2019 • 9 tweets • 4 min read • Read on X

@emilymbender

A brilliant article with insights from @emilymbender, @sarahbmyers (@AINowInstitute), and more. But taking a step back:
As an NLP researcher, I'm asking what the freaking hell is anyone doing grading student essays with automated tools that I'd not trust on my academic datasets?

https://twitter.com/zenalbatross/status/1163841346882428928

In 18 states "only a small percentage of students’ essays ... will be randomly selected for a human grader to double check the machine’s work".
In writing you're tasked with speaking to and convincing an audience through a complex, lossy, and fluid medium: language.

Guess what NLP is still bad at? Even if the marks aren't determining your life (!) the feedback you receive will be beyond useless. You're not having a conversation with a human. You're not convincing them. You're at best tricking a machine. A likely terribly ineffective machine.

Do you think that these systems from closed companies are equivalent in performance to the State of the Art in academia? Here's a hint: they definitely aren't. We know for certain the logic and reasoning of our existing SotA tools are unreliable in the best circumstances too.

Why do we think machines are ready to judge the words of any human, let alone a young student where the feedback will potentially shape their mind and their life? To intelligently deconstruct their writing and offer insight into how they can better themselves? To _judge_ them?

We've taken the already problematic concept of "teaching to the test" and elevated it to parody.

The test is free form text marked by a machine that can't read or write language with true logic or reasoning.

Write an essay that can trick this system into scoring you well.

This is our intellectual dystopia version of a Brave New World. We've replace reason with poorly approximated logic in the most dangerous of places. We'll only see these perverse interactions play out in the long span. A generation of students taught and judged by broken machine.

How about a sanity check?

Can the automated grading system even approximately answer the question it's grading?

We'd expect that from a human marker, right?

That doesn't guarantee it'll grade well but at least it's a first level sanity pass. This is not a "simple" question...

@allen_ai

Maybe "more fair" - let's at least see how these grading systems perform on grading a selection of correct / incorrect answers to elementary and middle school questions from @allen_ai's ARISTO. I don't think you'll be shocked by the outcome ... -_-
allenai.org/aristo/

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Smerity

Smerity

@Smerity

Jun 21, 2022

To add to a night of technical oddities there are three Cruise vehicles, all (literally) driverless, stuck at and partially blocking the corner of Geary and Mason 😅

There were originally four Cruise vehicles but one eventually made a grand escape. The leading Cruise vehicle has been there at least fifteen minutes as that's how long I had to wait for fast food. Occasionally one of them would lurch forward a little just for added suspense 🙃

To note the ones behind it that are occasionally moving have a different UI state so maybe they're just being particularly wary ¯\_(ツ)_/¯

Read 7 tweets

Smerity

@Smerity

Nov 7, 2019

For those in the language modeling space, a question regarding perplexity as a metric with varying tokenization:
- Is there a hard proof showing for a dataset D being tokenized using A and B that the perplexity is equivalent?
- Does that proof take into account teacher forcing?

I ask as I have never seen a proof and always assumed smarter people than myself had thought about it. Intuitively I felt it reasonable until I recently began pondering over the teacher forcing aspect which is essentially giving your model supervision, including at test time.

Imagine you had the task of language modeling:
"Bob and Alice were fighting for first place but who won? [predict: Bob or Alice]"
The claim is that the language model's perplexity (confusion) should be equal regardless of how we split the text.

Read 8 tweets

Smerity

@Smerity

Nov 6, 2019

https://twitter.com/Smerity/status/809202536850735104

In Dec 2016 Uber started taking _paid rides_ in _self driving cars_ without even filing for an autonomous testing permit in CA. That first day in SF it blew multiple red lights and had disengagement numbers hundreds of times worse than other companies.

https://twitter.com/Smerity/status/809202536850735104

https://twitter.com/Smerity/status/975789461219950592

Less than two years later, Uber having upped and left San Francisco due to their egregious behaviour, their self driving car killed someone. I collected why, in a thread, I had zero faith in their ability to safely execute and their checkered past.

https://twitter.com/Smerity/status/975789461219950592

Today: National Safety Transportation Board (NTSB) noted the system "did not include a consideration for jaywalking pedestrians". Elaine Herzberg was classified as a flurry of objects {other, bike, vehicle, ...} 5.6 seconds before impact.
theregister.co.uk/2019/11/06/ube…

Read 7 tweets

Smerity

@Smerity

Sep 19, 2019

Deep learning training tip that I realized I do but never learned from anyone - when tweaking your model for improving gradient flow / speed to converge, keep the exact same random seed (hyperparameters and weight initializations) and only modify the model interactions.

- Your model runs will have the exact same perplexity spikes (hits confusing data at the same time)
- You can compare timestamp / batch results in early training as a pseudo-estimate of convergence
- Improved gradient flow visibly helps the same init do better

Important to change out the random seed occasionally when you think you've isolated progress but minimizing noise during experimentation is OP. You're already dealing with millions of parameters and billions of calculations. You don't need any more confusion in the process.

Read 5 tweets

Smerity

@Smerity

Sep 1, 2019

@SFResearch

I'm incredibly proud that the low compute / low resource AWD-LSTM and QRNN that I helped develop at @SFResearch live on as first class architectures in the @fastdotai community :)

https://twitter.com/PiotrCzapla/status/1168120760201859072

I think the community has become blind in the BERT / Attention Is All You Need era. If you think a singular architecture is the best, for whatever metric you're focused on, remind yourself of the recent history of model architecture evolution.

Whilst pretrained weights can be an advantage it also ties you to someone else's whims. Did they train on a dataset that fits your task? Was your task ever intended? Did their setup have idiosyncrasies that might bite you? Will you hit a finetuning progress dead end?

Read 13 tweets

Smerity

@Smerity

Jul 22, 2019

https://twitter.com/tsimonite/status/1153340994986766336

What is OpenAI? I don't know anymore.
A non-profit that leveraged good will whilst silently giving out equity for years prepping a shift to for-profit that is now seeking to license closed tech through a third party by segmenting tech under a banner of pre/post "AGI" technology?

https://twitter.com/tsimonite/status/1153340994986766336

The non-profit/for-profit/investor partnership is held together by a set of legal documents that are entirely novel (=bad term in legal docs), are non-public + unclear, have no case precedence, yet promise to wed operation to a vague (and already re-interpreted) OpenAI Charter.

The claim is that AGI needs to be carefully and collaboratively guided into existence yet the output of almost every other existing commercial lab is more open. OpenAI runs a closed ecosystem where they primarily don't or won't trust outside of a small bubble.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Smerity

Try unrolling a thread yourself!

More from @Smerity

Smerity

Smerity

Smerity

Smerity

Smerity

Smerity

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!