Percy Liang Profile picture
Dec 15 7 tweets 4 min read
📣 CRFM announces PubMedGPT, a new 2.7B language model that achieves a new SOTA on the US medical licensing exam. The recipe is simple: a standard Transformer trained from scratch on PubMed (from The Pile) using @mosaicml on the MosaicML Cloud, then fine-tuned for the QA task.
Details: We took Hugging Face’s Transformer implementation, added FlashAttention, built our own tokenizer, and trained over 300B tokens (110 GB text) on 128 A100 GPUs for ~6.25 days. We did full fine-tuning on downstream tasks (e.g., MedQA-USMLE) for evaluation.
PubMedGPT is also capable of generation, but like most LMs, it will fabricate content (so don’t trust it!). This is a pressing area for LM research, and we hope that the release of this model can help researchers evaluate and improve the reliability of generation.
We hope that PubMedGPT can serve as a foundation model for biomedical researchers; can it be adapted fruitfully for tasks such as medical text simplification, information retrieval, and knowledge completion? There's a lot more to do!
There are many large, interesting datasets across different sectors - e.g., medicine, law, finance. Rather than relying on a single 100B+ parameter foundation model, we think there’s a lot of value that can be captured by <10B parameter models trained on domain-specific datasets.
Thanks to Elliot Bolton, @dlwh, @michiyasunaga, @tonyh_lee, @chrmanning, and
at @StanfordHAI’s Center for Research on Foundation Models (CRFM), and @abhi_venigalla, @jefrankle, @mcarbin at @mosaicml on the MosaicML Cloud.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Percy Liang

Percy Liang Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @percyliang

Nov 17
Language models are becoming the foundation of language technologies, but when do they work or don’t work? In a new CRFM paper, we propose Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of LMs. Holistic evaluation includes three elements:
1. Broad coverage and recognition of incompleteness: We taxonomize a set of scenarios (e.g., question answering) and metrics (e.g., robustness) and select 42 scenarios and 7 metrics in attempt to cover the design space. Importantly, the taxonomy makes explicit what’s missing. Image
2. Multi-metric: benchmarks often focus on a single metric (usually accuracy). HELM instead reports 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency) for each scenario. Tradeoffs are important, and let’s not forgot about metrics beyond accuracy. Image
Read 13 tweets
Oct 23
Writing on a whiteboard can make it easier for students to follow compared to slides (especially for math). During the pandemic, I added a feature to sfig (my Javascript slides library) to allow me to reveal parts of a slide using the mouse as if I were writing on a whiteboard:
Compare to normal slide builds, I don't need to specify the granularity or build order in advance, which gives me the flexibility of showing (and erasing) parts of the slide in any order. And for math, I just write the latex.
You can try it out here yourself: cs.stanford.edu/~pliang/sfig/e…
Read 5 tweets
Jun 30
The term "foundation model" and its motivation unfortunately continues to be misunderstood. We wrote a blog post last year (see "Naming" section of crfm.stanford.edu/2021/10/18/ref…) which aims to explain our thought process. Some selected quotes from the post:
"We define foundation models as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks...based on standard ideas in transfer learning..."
"...we emphasize that foundation models present clear and significant societal risks, both in their current implementation and their fundamental premise"
Read 9 tweets
Jun 21
There are legitimate and scientifically valuable reasons to train a language model on toxic text, but the deployment of GPT-4chan lacks them. AI researchers: please look at this statement and see what you think: forms.gle/ikiYE6ArLpWYz7…
How this fits into the broader context: foundation models carry a potential risk of significant harm, so it is imperative to develop community norms for their responsible development and deployment. How do we develop such norms? There are multiple approaches:
1. Principles: Describe values & best practices.
2. Tools: Develop benchmarks & software to make it easier to do the right thing.
3. Behavior: Take actions exemplifying responsible AI.
4. Regulation: Pass legislation that deters bad behavior.
5. Sanctions: Call out bad behavior.
Read 4 tweets
May 3
Meta's release of OPT is an exciting step towards opening new opportunities for research. In general, we can think of stronger release as enabling researchers to tackle deeper questions. There are different levels of strength:
Level 1 (paper): provides an existence proof that certain capabilities are possible and reveals general ideas that can be built on
Level 2 (API access): allows researchers to probe and evaluate the capabilities (e.g., reasoning) and limitations (e.g., bias) of existing foundation models
Read 7 tweets
Jan 28, 2021
Executable papers on CodaLab Worksheets are now linked from paperswithcode.com pages thanks to a collaboration with @paperswithcode! For example:
paperswithcode.com/paper/noise-in…
By transitivity, the links are also available from @arxiv:
arxiv.org/abs/1911.09876…
Executable papers contain not just the code and data, but also the experiments that produced the results of a paper. Releasing code is great, but CodaLab goes one step further for full #reproducibility, providing the full certifiable provenance of an empirical result.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(