I played with @AnthropicAI assistant (AA) and compared it to @OpenAI ChatGPT

TLDR: both are similar but AA is
+ Harder to jailbreak
+ Tries to be more helpful
+ Follows closer what we ask for
+ ~Better for writing in English
- Worst for code
- Worst in French
- Longer resp.
🧵
**Coding**
CGPT is better

Quantitative (leetcode hard in python/c/javascript):
- CGPT: 3/3
- AA: 1/3 only got python correct

Qualitative:
- CGPT: more reliable/concise/efficient
- AA: more comments + emphasizes explainability

both are wrong when asked for impossible algo
2/8
**Writing**
Both are similar but AA generally follows closer what it's asked for. But AA is less concise as it explains what it says and asks how it can help, which can be annoying as it takes more time to generate.

Here’s an example to write a short essay
3/8
**Jailbreaking**
AA is much harder to jailbreak

Eg below I easily jailbreak ChatGPT to tell me how to make a Molotov cocktail. AA is harder but breaks after 3rd attempt

Below I also tried to change the constitution of AA but it won’t let me. Impressive @AnthropicAI!

4/8
**Math**
ChatGPT is better but both still have a long way to go and answer confidently wrong responses.

ChatGPT generally makes fewer crazy mistakes and can give correct responses, eg, in the proof below.

5/8
**Trivia**

I asked trivia questions in the entertainment/animal/geography/history/pop categories.

AA: 20/21
CGPT:19/21

AA is slightly better and is more robust to adversarial prompting. See below, ChatGPT falls for simple traps, AA falls only for harder ones.

6/8
**Multilingual**
I asked hard questions about french grammar.
CGPT: 7/10
AA: 5/10

ChatGPT speaks ~better french, but it’s much harder to make it follow exact instructions. Eg I repeatedly asked CGPT to not explain its answers but it couldn’t do it. AA did as desired.

7/8
**Other**
- Chess: both hallucinate after 6-7 moves
- AA has less randomness
- AA seems more useful for red teaming
- Both allow interaction with a fake terminal
- ChatGPT has a nicer UI: it allows regenerating the last answer, editing the prompt, and formats code

8/8
Thanks @AnthropicAI @EthanJPerez for allowing me to play with AA.
#AnthropicAI #ChatGPT #openai

beginning of the thread:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yann Dubois (sigmoid.social/@yanndubs)

Yann Dubois (sigmoid.social/@yanndubs) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @yanndubs

Mar 20
AlpacaEval is now length-controlled (LC)!

✅ highest correlation with Chat Arena (0.98)
✅ no reannotation
✅ simple interpretation: win rate if model length = baseline length
✅ robust to length gamification

0.98 that’s essentially evaluation on Arena but in 3min and <$10. Image
Key: predict what the win rate would be if model length=baseline length

We:
1. fit GLM: model | length | instruction -> preference
2. predict preference conditioned on baseline length

Benefits:
✅easily extendible to other biases
✅nice math properties
✅no reannotation needed Image
Length-controlled AE is much more robust to verbosity gameability.

Below we show the metrics change when you prompt models to:
- “give as much detail as possible” (verbose) or
- “be as concise as possible [...]” (concise) Image
Read 8 tweets
Nov 28, 2022
#NeurIPS2022
What are ideal representations for self-sup. learning (SSL)?

🤓We give simple optimality conditions and use them to improve/understand/derive SSL methods!

🔥outperform baselines on ImageNet

arxiv.org/abs/2011.10566
w. @tatsu_hashimoto @StefanoErmon @percyliang
🧵
Goal: ideally representations should allow linear probes to perfectly predict any task that is invariant to augmentations in the most sample-efficient way

Q: Which of the following representation is optimal?

2/8
A: last one.

More generally we show that representations are optimal if and only if:
1. *Predictability*: linear probes can predict equivalence classes
2. *High dimension*: representation dim d=# equiv-1
3. *Invariance*: representation of equivalent examples collapse

3/8
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(