Tweet

Nikolai Yakovenko

10 Dec, 22 tweets, 5 min read

This account has become mostly CT, but I still care (deeply) about deep learning and large language models.

While models have gotten bigger and better, it seems this is having surprisingly little effect on downstream applications...

🧵 (1/n)

The growth in parameter counts has been extraordinary. I had a tiny part to play, my friends and team-mates have been on the forefront, in the lab, taking state of the art language models from 300M params to 8B-11B (when I was there), to 1/2 T params
(2/n)
developer.nvidia.com/blog/using-dee…

https://twitter.com/Miles_Brundage/status/1469113951954747393?s=20

Work from Nvidia. MSFT, OpenAI, Google and FB research, has transformed NLP into a large scale deep learning field. It's amazing you can encode that many params, go through that many documents, in 100+ languages. Even handle everything as bytes...

(3/n)

https://twitter.com/Miles_Brundage/status/1469113951954747393?s=20

@huggingface

@huggingface in particular, has made this work accessible to everyone else. Maybe not the biggest models, but thousands of organizations are using large NN for NLP, as part of their process.

(4/n)

huggingface.co/organizations

Instead of one-off models, clever solutions to each sub-problem, with different code bases -- most large to medium scale NLP work -- like sentiment analysis, routing customer comments, text completion suggests... uses deep NLP, and often starts with model on HuggingFace.

(5/n)

So why the skepticism?

I'm not skeptical, exactly. It's just surprising that we are still using this new tech to solve *old* problems better. That or creating great "wow" demos like some of the GPT-3 generative demos, but beyond demos, mostly hammers looking for a nail.

(6/n)

Transformer based deep NNs are clearly better for language tasks, like translation, sentiment analysis, similarity and topic classification. They probably help with Ads and certainly useful for information retrieval (last stage Google scoring).

But these are old problems

(7/n)

Previous breakthrough improvements in NLP led to new categories. Hate the automatic phone agent all you like (and going away now) but that was zero to one. Same with Google search in a sense. Good enough translation was a huge deal... even good-enough spelling & grammar

(8/n)

Deep learning transformed aspects of computer vision. The ability to tag your friends in photos went from NGMI to almost perfect.

You wouldn't have self driving cars without DL.

GANs are still a bit more experimental...

(9/n)

Even reinforcement learning (RL) has transformed out ability for computers to play certain games at expert level. Even if it's been a disappointment so far in every other domain... (possibly apart from chip layout/design, that's super secret so hard to follow)

(10/n)

So I ask you: what are some NLP applications, that currently don't work well, which would be transformed by a 10x improvement in accuracy, quality, or understanding?

That's what I'd like to see. Not more 2% improvements on Ads + recall. Inspiring demos are fun, but

(11/n)

Here are some that come to mind:
* summarization -- like actually, rewrite this whole article/few articles in concise form, link to fuller text
* text re-writing -- for style, brevity, etc [students will love this]
* make a general model for importance -- urgent? why?

(12/n)

These goals are a bit vague, but you know them when you see them. If this works at all, it will start focused on specific corners of language, probably in English, etc.

There are issues of eval, training and test data.

But if you narrow the problem, it can be done...

(13/n)

It's hard for outsider to appreciate how powerful the new huge NN models have become. They have the horses! But I don't know that enough progress has been made on using those horses on a hard but valuable sub-problem. Instead of chatbots & boiling o̶c̶e̶a̶n̶ Reddit.

(14/n)

@polynoamial

I suspect this focus will lead to work in local, online optimization -- paraphrasing @polynoamial (poker AI) it doesn't make sense to pre-solve the whole game, instead of searching locally in specific position.

These giant LMs are good pre-training but seem a bit static.

(15/n)

I like work like continuous prompting... but this is not about research. The research is good! The models are great and the people are motivated and talented.

I'm just a bit surprised it's not having a bigger impact in terms of new NLP problems not before possible.

(16/n)

And I don't think these new amazing solutions will emerge from bigger models, better datasets, more safety concerns, etc. All fine, but seem like +1 and not transformative.

(17/n)

So far I don't think you could argue that this breed of large NLP models have made a bigger impact on how machines use language, than the pre-pre-DL NLP work done at IBM in the 1980s.

But maybe that's how it always goes...

(18/n)

@huggingface

We are in the adoption stage. Everyone who learns NLP gets into DL, usually via @huggingface. They earn their stripes training sentiment models, topic models, or translation. Maybe they finetune on a domain-specific dataset or build a corpus for a long-tail langauge...

(19/n)

As these existing problems improve by 5% a year... these same people will, years from now, try something new that hasn't been done before.

And they may find that something impractical before, is now better because of 10x better understanding.

(20/n)

And maybe that's how it always goes, and always should be.

Tech is built (for lols), tech is adopted to improve on existing problems, clever ppl find the tech also solves new problems...

(21/n)

So I guess I'm back to crypto!

(end!)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ivan_bezdomny

Nikolai Yakovenko

@ivan_bezdomny

10 Dec

What do you think Punk #4156 is worth? In today’s market but with adequate time to take bids. Say at an auction.

https://twitter.com/cryptopunksbot/status/1469044254081228803

Listed today at 4.156K then dropped all the way to 2.5K ETH and sold.

People scrambling to get bid together…

https://twitter.com/cryptopunksbot/status/1469044254081228803

https://twitter.com/punk4156/status/1469010044649984003

Earlier this afternoon…

https://twitter.com/punk4156/status/1469010044649984003

Read 4 tweets

Nikolai Yakovenko

@ivan_bezdomny

8 Dec

I didn’t know, apparently all epidemiological models used by governments, assume a form of homogenous mixing — ie people infect each other randomly, either by region, or by age group.

Of course it’s not remotely true. IRL social graphs are very sparse!

There’s plenty of theory and software on modeling sparse interactions. But the health officials aren’t that good at math…

So they use a model with some form of underlying assumption, that the population is many N sub populations — children, pensioners, Floridians — who all infect each other randomly and between populations at a different lower rate.

Read 7 tweets

Nikolai Yakovenko

@ivan_bezdomny

15 Oct

Updated the Punks model.
* fixed how we decay bids over time
* Hoodies & Beanies with better prices (median around 300Ξ)
* About 300 Punks in that 250-400Ξ ranges
* Bottom value 100.7Ξ, median 113.6Ξ -- has been steady
* Market cap 1.69mΞ ~ 6.5B USD

deepnftvalue.com/punks/attribut…

In the model, we obviously want to fit to bids (as minimum) and offers (as max) as well as predicting tomorrow's sale.

Realistically, this means decaying the weight on bids over time. Say a bid for past 3 day is 100%... do you still care about a bid (or sale) from month ago? Yes

The question: hows quick to decay the data.

We now keep bids around for up to 120 days (!) -- seems high, but no Beanie has sold for two months.

No evidence that prices have dropped -- many relative to median/floor but not in absolute terms.

deepnftvalue.com/punks/attribut…

Read 5 tweets

Nikolai Yakovenko

@ivan_bezdomny

13 Oct

https://twitter.com/hubermanlab/status/1447549867681746945

Excellent episode on fasting — more precisely feeding widow time restriction. Lots of research, and practical considerations.

Recommends 8-10 hr eating window, consistent daily (no weekend time zone shift). No eating 2-3 hours before sleep and 1+ hr after waking.

https://twitter.com/hubermanlab/status/1447549867681746945

Also few good tips on extending the morning fast:
* water, coffee or tea if you like
* salt and lemon can make dramatic difference in haziness where you think you need sugar

Some protein relatively early in the day also good for athletes. Talking 10am or noon.

The fasted state is very important, for hormones, body regeneration, etc. Can’t be eating and digesting all day.

The fasted state does not begin as soon as you’re done eating.

A light walk after dinner helps. As do things like cinnamon.

A glass of red spikes blood sugar again

Read 4 tweets

Nikolai Yakovenko

@ivan_bezdomny

13 Aug

Central Park — it’s good to be back 🏃‍♂️🗽👌

@rabois

Fitting podcast for the run — though zone 2 (7 miles) not HIIT like @rabois

https://twitter.com/rabois/status/1424344145896906752

A little surprised that Keith doesn’t do fasted workouts. Though he also trains twice a day. I do that…. sometimes. More during Covid at him in nyc actually. Went pretty Tsatsouline with pull-ups, dumbbells and kettlebells at home 👌

Read 5 tweets

Nikolai Yakovenko

@ivan_bezdomny

23 Jul

Gym and a swim with 🐟

On a Friday afternoon. Perfect weather.

This is why we live here

Lots of tropical fishes by the rocks. No school of tuna this time, nor 🦈

Is it odd I want to see 🦈 in the water, again?

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Nikolai Yakovenko

Try unrolling a thread yourself!

More from @ivan_bezdomny

Nikolai Yakovenko

Nikolai Yakovenko

Nikolai Yakovenko

Nikolai Yakovenko

Nikolai Yakovenko

Nikolai Yakovenko

Did Thread Reader help you today?

Like this author's thread?