Shayne Longpre Profile picture
Mar 28, 2023 7 tweets 6 min read Read on X
What dates📅 can @OpenAI, @AnthropicAI, @CohereAI models reliably answer questions for?🔭

I binary-search through "future" Wiki events to find out. Results ❌🟰❌documentation:

#GPT4 ➡️~Dec 19 ('21)
#ChatGPT ➡️~Oct 24
Claude v1.2➡️~Oct 10
Cohere XL Nightly➡️~Apr 24 ('22)

1/🧵
GPT4 says it is trained up to Sept 2021.

I found it correctly answers unknowable events in Oct, Nov, and even Dec 11th & 19th.

In late Dec it begins to abstain.

2/
Interestingly, GPT 3.5 "Default" answers correctly only until ~Oct 24, 2021, but GPT 3.5 "Legacy" answers correctly until ~Oct 31, 2021 then begins hallucinating false answers or abstaining in Nov.

Perhaps this is due to finetuning rather than pretraining data?

3/
@AnthropicAI's Claude v1.2 model correctly answers questions July 11, Aug 12, Sept 26, Oct 10 but abstains at Oct 9 & Nov 2.

➡️The trick with Claude is to ask it about an event without telling it the date (see examples).

4/
@CohereAI's Command XL Nightly provides the most recent correct answers of the 3 models! 🌟

✅It correctly answers Qs in March 9 & April 24, 2022 but hallucinates May onwards.

❌It does not seem to abstain from answering future info it doesn't know, like the others.

5/
#Wikipedia yearly event pages are an awesome resource for this: e.g. en.wikipedia.org/wiki/2022

I found national election results and sports tournaments the most reliable: they are sufficiently high profile, and (usually) unpredictable.

6/
Thanks to @natfriedman’s nat.dev tool for making this analysis possible!

Please feel free to leave thoughts/comments!

/🧵

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shayne Longpre

Shayne Longpre Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ShayneRedford

Jun 23
Thrilled to collaborate on the launch of 📚 CommonPile v0.1 📚 !

Introducing the largest openly-licensed LLM pretraining corpus (8 TB), led by @kandpal_nikhil @blester125 @colinraffel.

📜: arxiv.org/pdf/2506.05209
📚🤖 Data & models: huggingface.co/common-pile
1/Image
📚 Drawn from 30 diverse, permissively licensed sources (science, code, books, gov docs, news, audio transcripts & more).

🔍 “Openly licensed” = free for anyone to use, modify, and share for any purpose, as defined by Public Knowledge (opendefinition.org)

🔧 Every cleaning + processing step is open-sourced so anyone can reproduce or build on it.

2/Image
🤖 We also release Comma v0.1 (7B) — trained on CommonPile data, yet shockingly competitive with models like Llama-2-7B, which are trained on tons of more restrictively licensed text.

3/ Image
Image
Read 6 tweets
Mar 13
What are 3 concrete steps that can improve AI safety in 2025? 🤖⚠️

Our new paper, “In House Evaluation is Not Enough” has 3 calls-to-action to empower independent evaluators:

1️⃣ Standardized AI flaw reports
2️⃣ AI flaw disclosure programs + safe harbors.
3️⃣ A coordination center for transferable AI flaws affecting many systems.

1/🧵Image
🌟Motivation🌟

Today, GPAI serves 300M+ users globally, w/ diverse & unforeseen uses across modalities and languages.

➡️ We need third-party evaluation for its broad expertise, participation and independence, including from real users, academic researchers, white-hat hackers, and journalists.

2/Image
However, third-party evaluation currently faces key barriers:

➡️No flaw-reporting culture
➡️Lack of coordinated disclosure infrastructure
➡️Inadequate researcher protections

3/
Read 8 tweets
Feb 12
I wrote a spicy piece on "AI crawler wars"🐞 in @MIT @techreview (my first op-ed)!

While we’re busy watching copyright lawsuits & the EU AI Act, there’s a quieter battle over data access that affects websites, everyday users, and the open web.

🔗

1/technologyreview.com/2025/02/11/111…Image
Crawlers are essential to our online ecosystem: they power search, price comparisons, news aggregation, security, accessibility, journalism, and research.

Think of them as a delicate biodiversity now threatened by a new “invasive species”: general-purpose AI with an insatiable appetite for web data.

2/
Publishers are understandably worried: news sites fear losing readers to AI chatbots; artists and designers fear AI image generators; coding forums fear AI-driven replacements.

Increasingly, they block or charge all non-human traffic, not just AI crawlers.

3/ Image
Read 6 tweets
Jul 19, 2024
✨New Preprint ✨ How are shifting norms on the web impacting AI?

We find:

📉 A rapid decline in the consenting data commons (the web)

⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic)

⛔️ Robots.txt preference protocols are ineffective

These precipitous changes will impact the availability and scaling laws for AI data, affecting coporate developers, but also non-profit and academic research.

🔗

1/dataprovenance.org/consent-in-cri…Image
General-purpose AI relies on massive data collected by web crawlers.

The Data Provenance Initiative team annotated ~14k of the websites that underly pretraining datasets, for:

➡️Consent policies: robots.txt, ToS
➡️Monetization: ads, paywalls
➡️Purpose: news, e-commerce, forums, etc

2/Image
🌟Finding 1🌟 Access restrictions are rising dramatically

In <1 year, C4/RefinedWeb have seen:

➡️ >5% of all tokens become unavailable for AI training
➡️ >30% of tokens from top-2k, best quality, active domains become unavailable

Plus, 40%+ of tokens are from sites w/ anti-crawling terms

These are significant & unprecedented shifts in short periods.

3/Image
Read 12 tweets
Mar 5, 2024
Independent AI research should be valued and protected.

In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward.



1/sites.mit.edu/ai-safe-harbor/Image
Researchers & companies agree:

➡️ Generative AI poses a range of risks

➡️ We need independent research participation for safety & accountability

But current AI company policies can chill good faith, independent testing of generative AI systems (sometimes unintentionally).

2/Image
We hope AI companies will make commitments to protect independent research, even when it exposes them to criticism.

We propose basic legal and technical protections to design transparency, accountability, and user safety into generative AI.

3/ Image
Read 9 tweets
Oct 25, 2023
📢Announcing the🌟Data Provenance Initiative🌟

🧭A rigorous public audit of 1800+ instruct/align datasets

🔍Explore/filter sources, creators & license conditions

⚠️We see a rising divide between commercially open v closed licensed data

🌐:

1/ dataprovenance.org
Context: A Crisis in Data Transparency

➡️Instruct/align finetuning often compiles 100s of datasets

➡️How can devs filter for datasets without legal/ethical risk, and understand the resulting data composition?

2/ Image
Platforms like HuggingFace 🤗 or GitHub🐙 see license omissions of 72%+ and errors of 46%+

(⚠️Not their fault, just the nature of crowdsourcing)

We carefully re-annotate 1800+ datasets and categorize licenses.

3/ Image
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(