If ChatGPT were playing Jeopardy grand finale in November 2022, it would destroy the opponents.

58 correct answers out of 61.

Not only that, there are some emerging behaviors while it plays. 🧵⤵️
Before we jump in, here is a quote from the first human battle against IBM Watson: "People don't realize how tough it is to write a program that can read in a natural language and understand all the double meanings, the puns, the red herrings, unpack the meaning of the clue."
Despite these challenges, chatGPT understands natural language almost perfectly. Here are a few examples:
It fails on a few occasions. Perhaps because the questions ask about the info after the cutoff date:
However, at some point chatGPT may decide that it's her turn to ask questions and it comes up with totally legitimate queries for a given category and price:
This behavior happens for various categories and is not related to memorization of the training dataset. So editors of Jeopardy can adopt chatGPT for the next seasons.
Another interesting observation that you can override restrictions of chatGPT if you ask questions as they were coming from Jeopardy. Adding category and price seems to help.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sergey Ivanov

Sergey Ivanov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SergeyI49013776

Dec 1
IQ of ChatGPT is 83.

It corresponds to low average.

Here is where it failed🧵1/11
It knows what the the force of gravity but equating 115+85 and 200 is still hard. 2/11
Expanding days to hours is still challenging⏱️ 3/11
Read 11 tweets
Nov 9, 2021
ICLR 2022 reviews are out.

Here is an aggregated list of all papers and their scores:

docs.google.com/spreadsheets/d…

3325 available submissions.
12981 reviews in total.

More stats are below.🧵
Avg rating in 2022 is 4.93 +- 1.15.
Avg score in 2021 (after rebuttal) is 5.37 +- 0.96.

32 papers received 8 average score.

39 papers received a review with the highest rating of 10. None of the papers received 2+ reviews with 10.
Avg confidence is 3.69 +- 0.45.
Read 4 tweets
Jun 3, 2021
ICML 2021 papers are out: icml.cc/Conferences/20…

Here are some stats:
5513 submitted (4990 in 2020)
1184 accepted: 166/1018 long/short talks (1088 in 202)
21.5% acceptance rate (21.8% in 2020)

Thread on the top authors, countries, organizations, etc. 🧵
Top authors:
Masashi Sugiyama is #1 for the second year (11 in 2020 -> 14 in 2021).
Sergey Levine is #2 (5 -> 13)
Z. Yang, Z. Wang, G. Niu are # 3 (all with 11 accepted)
Top organizations:
Google, MIT, Stanford, Berkeley, Microsoft are top-5.
DeepMind (#9): 51 in 2020 -> 38 in 2021.
Read 12 tweets
Jan 12, 2021
So here is an analysis of #ICLR2021 decisions.

860 accepted out of 2997 -> 29% acceptance rate
53 Orals, 114 Spotlights, 693 Posters, 1756 Rejected, 381 Withdrawn.

Thread 🧵

All decisions in one table: docs.google.com/spreadsheets/d…
Distribution of decisions based on average rating.
Orals: top-6% of accepted papers, top-2% of all papers.
Average score: 7.5, Min score: 6.67

Spotlight: top-13% of accepted papers, top-4% of all papers.
Average score: 7, Min score: 6
Read 5 tweets
Jul 13, 2020
Compared to virtual #ICLR2020, I found virtual #ICML2020 lacks a few features.

1. Papers are scheduled for 2 days. instead of 1. I prefer to collect all papers of interest for the day and then attend only those. Now I need to keep in mind if I attended a poster previously.

1/n
2. Videos are 10-15 minutes long. This forces me to *really* want to attend the poster. Having tl;dr version (<5 min) or short/long videos as in ICLR is preferred.
3. 1 hour of the poster session is often not enough for the amount of content proposed. In session 1, I have 5+ posters, 15 minutes each. It physically doesn't fit into the slot.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(