Tweet

Percy Liang

Dec 15 • 7 tweets • 4 min read

@mosaicml

📣 CRFM announces PubMedGPT, a new 2.7B language model that achieves a new SOTA on the US medical licensing exam. The recipe is simple: a standard Transformer trained from scratch on PubMed (from The Pile) using @mosaicml on the MosaicML Cloud, then fine-tuned for the QA task.

Details: We took Hugging Face’s Transformer implementation, added FlashAttention, built our own tokenizer, and trained over 300B tokens (110 GB text) on 128 A100 GPUs for ~6.25 days. We did full fine-tuning on downstream tasks (e.g., MedQA-USMLE) for evaluation.

PubMedGPT is also capable of generation, but like most LMs, it will fabricate content (so don’t trust it!). This is a pressing area for LM research, and we hope that the release of this model can help researchers evaluate and improve the reliability of generation.

We hope that PubMedGPT can serve as a foundation model for biomedical researchers; can it be adapted fruitfully for tasks such as medical text simplification, information retrieval, and knowledge completion? There's a lot more to do!

There are many large, interesting datasets across different sectors - e.g., medicine, law, finance. Rather than relying on a single 100B+ parameter foundation model, we think there’s a lot of value that can be captured by <10B parameter models trained on domain-specific datasets.

Blog: crfm.stanford.edu/2022/12/15/pub…
GitHub: github.com/stanford-crfm/…
Model: huggingface.co/stanford-crfm/…

@dlwh

Thanks to Elliot Bolton, @dlwh, @michiyasunaga, @tonyh_lee, @chrmanning, and
at @StanfordHAI’s Center for Research on Foundation Models (CRFM), and @abhi_venigalla, @jefrankle, @mcarbin at @mosaicml on the MosaicML Cloud.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @percyliang

Percy Liang

@percyliang

Nov 17

Language models are becoming the foundation of language technologies, but when do they work or don’t work? In a new CRFM paper, we propose Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of LMs. Holistic evaluation includes three elements:

1. Broad coverage and recognition of incompleteness: We taxonomize a set of scenarios (e.g., question answering) and metrics (e.g., robustness) and select 42 scenarios and 7 metrics in attempt to cover the design space. Importantly, the taxonomy makes explicit what’s missing.

2. Multi-metric: benchmarks often focus on a single metric (usually accuracy). HELM instead reports 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency) for each scenario. Tradeoffs are important, and let’s not forgot about metrics beyond accuracy.

Read 13 tweets

Percy Liang

@percyliang

Oct 23

Writing on a whiteboard can make it easier for students to follow compared to slides (especially for math). During the pandemic, I added a feature to sfig (my Javascript slides library) to allow me to reveal parts of a slide using the mouse as if I were writing on a whiteboard:

Compare to normal slide builds, I don't need to specify the granularity or build order in advance, which gives me the flexibility of showing (and erasing) parts of the slide in any order. And for math, I just write the latex.

You can try it out here yourself: cs.stanford.edu/~pliang/sfig/e…

Read 5 tweets

Percy Liang

@percyliang

Jun 30

The term "foundation model" and its motivation unfortunately continues to be misunderstood. We wrote a blog post last year (see "Naming" section of crfm.stanford.edu/2021/10/18/ref…) which aims to explain our thought process. Some selected quotes from the post:

"We define foundation models as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks...based on standard ideas in transfer learning..."

"...we emphasize that foundation models present clear and significant societal risks, both in their current implementation and their fundamental premise"

Read 9 tweets

Percy Liang

@percyliang

Jun 21

There are legitimate and scientifically valuable reasons to train a language model on toxic text, but the deployment of GPT-4chan lacks them. AI researchers: please look at this statement and see what you think: forms.gle/ikiYE6ArLpWYz7…

How this fits into the broader context: foundation models carry a potential risk of significant harm, so it is imperative to develop community norms for their responsible development and deployment. How do we develop such norms? There are multiple approaches:

1. Principles: Describe values & best practices.
2. Tools: Develop benchmarks & software to make it easier to do the right thing.
3. Behavior: Take actions exemplifying responsible AI.
4. Regulation: Pass legislation that deters bad behavior.
5. Sanctions: Call out bad behavior.

Read 4 tweets

Percy Liang

@percyliang

May 3

Meta's release of OPT is an exciting step towards opening new opportunities for research. In general, we can think of stronger release as enabling researchers to tackle deeper questions. There are different levels of strength:

Level 1 (paper): provides an existence proof that certain capabilities are possible and reveals general ideas that can be built on

Level 2 (API access): allows researchers to probe and evaluate the capabilities (e.g., reasoning) and limitations (e.g., bias) of existing foundation models

Read 7 tweets

Percy Liang

@percyliang

Jan 28, 2021

@paperswithcode

Executable papers on CodaLab Worksheets are now linked from paperswithcode.com pages thanks to a collaboration with @paperswithcode! For example:
paperswithcode.com/paper/noise-in…

@arxiv

By transitivity, the links are also available from @arxiv:
arxiv.org/abs/1911.09876…

Executable papers contain not just the code and data, but also the experiments that produced the results of a paper. Releasing code is great, but CodaLab goes one step further for full #reproducibility, providing the full certifiable provenance of an empirical result.

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Percy Liang

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @percyliang

Percy Liang

Percy Liang

Percy Liang

Percy Liang

Percy Liang

Percy Liang

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!