I led the team that studied mask efficacy in early 2020 and published our results in the Proceedings of the National Academy of Science.
I spent three months earlier this year revisiting this topic, and today I'm publishing my notes and links here: fast.ai/2022/07/04/upd…
An admission: these notes were meant to be the basis of another academic paper, and I gave up on it. In Jan 2022 when I finished this research, I looked around, and it seemed like no-one much cared about avoiding COVID any more.
So I figured it wasn't worth spending time on.
It seems like in the last couple of weeks there's signs that folks might be more open to protecting themselves and others by wearing a mask.
But the vast majority of public health advice I see on mask use is scientifically inaccurate. So I'm digging out this research for you.
Masks work.
An observational study of Beijing households analyzed the impact of mask use in the community on COVID-19 transmission, finding that masks were 79% effective in preventing transmission, if used by all household members prior to symptoms. gh.bmj.com/content/5/5/e0…
However, with omicron “surgical masks are no longer sufficient in most public settings, while correctly fitted FFP2 respirators still provide sufficient protection, except in high aerosol producing situations such as singing or shouting” smw.ch/article/doi/sm…
Surgical masks are much less effective than N95's, because they are made to stop liquid splashes during surgery, rather than made to stop airborne transmission.
But you can improve them with a 30 sec trick.
But there's really no need to where anything but an N95 nowadays. They're widely available, inexpensive, and the best ones are very comfortable and breathable.
You can re-use an N95 until the straps wear out. I find that's about 30 times with my usage.
Mask maker 3M says "There is no time limit to wearing an FFR. Respirators can be worn until they are dirty, damaged or difficult to breathe through." (I find the straps wear out 1st.)
If you use a good mask like the Aura and you're not a healthcare worker you don't need a fit-test.
Non-experts get an average fit factor of 88, well over the recommended goal of 10. (In healthcare the goal is 100, to provide a 10x safety margin.) tandfonline.com/doi/abs/10.108…
There isn't a shortage of N95s so you don't need to reserve them for healthcare workers. In fact, not enough people are buying them, so factories are closing down bloomberg.com/news/articles/…
Masks needn't be a substantial burden. A study found that “in healthy healthcare workers, [N95s] did not impose any important physiological burden during 1 hour of use, at realistic clinical work rates” rc.rcjournal.com/content/55/5/5…
For dozens more links to academic studies on masks, see the links in my full research notes: fast.ai/2022/07/04/upd…
BTW, here's our original 2020 paper, written in April 2020, but not published in PNAS until Jan 2021 (although it was available on preprints.org throughout that time): pnas.org/doi/10.1073/pn…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I'm glad @levelsio checked this, but sad our contrib has been erased by later big tech co's. Alec Radford said ULMFiT inspired GPT. ULMFiT's first demo predated BERT.
Today's 3-stage LLM approach of general corpus pretraining and 2 stages of fine-tuning was pioneered by ULMFiT.
There have been many other important contributions, including attention (Bahdanau et al), transformers, RLHF, etc.
But before all this, basically everyone in NLP assumed that each new domain needed a new model. ULMFiT showed that a large pretrained model was actually the key.
I got push-back from pretty much everyone about this. My claim that fine-tuning that model was the critical step to achieving success in NLP was not something people were ready to hear at that time.
I gave many talks trying to convince academics to pursue this direction.
Announcing fasttransform: a Python lib that makes data transformations reversible/extensible. No more writing inverse functions to see what your model sees. Debug pipelines by actually looking at your data.
We took the `Transform` class out of fastcore, replaced the custom type dispatch system with @ikwess's plum-dispatch, mixed it all together, and voila: fasttransform! :D
Wow, actual grown men are still doing the "I asked the LLM about itself and it said" thing.
In 2025.
Folks, LLMs don't know anything about how they themselves are built or deployed, unless they've been explicitly programmed with that information (which they almost never are).
I've recently been surprised to discover that a few of my friends are choosing to use nicotine to help them with focus, even though they are not ex-smokers.
I decided to look into it, and it turns out that there are documented health benefits of nicotine for some people. 🧵
I specifically looked into nicotine for ADHD, since, at least among children, ADHD and giftedness go hand in hand statistically (which would apply in adulthood too), and because focus was mention as an area where nicotine can be helpful.
There is a great overview below. But "Very surprisingly, there are… no further… studies.
Research into active ingredients… is expensive.
In addition, nicotine has a very poor image… which impairs its marketability" adxs.org/en/page/192/ni…
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
ModernBERT is available as a slot-in replacement for any BERT-like model, with both 139M param and 395M param sizes.
It has a 8192 sequence length, is extremely efficient, is uniquely great at analyzing code, and much more. Read this for details: huggingface.co/blog/modernbert
Seven months ago, @bclavie kicked things off, and soon @benjamin_warner & @antoine_chaffin joined him as project co-leads. I don't think anyone quite knew what we were getting in to…
It turns out that training a new, SoTA model from scratch is actually pretty hard. Who knew? 🤷
I wonder if the @PyTorch analysis behind this is mistaken. I suspect most of the pypi installs they’re seeing are from CI and similar. Conda installs are the standard for end user installation of PyTorch afaik
@PyTorch Conda aggressively caches installs so looking at relative download numbers won’t give a great sense of real usage.