Discover and read the best of Twitter Threads about #NLProc

Most recents (24)

We want to pretrain🤞
Instead we finetune🚮😔
Could we collaborate?🤗

ColD Fusion:
🔄Recycle finetuning to multitask
➡️evolve pretrained models forever

On 35 datasets
+2% improvement over RoBERTa
+7% in few shot settings
🧵

#NLProc #MachinLearning #NLP #ML #modelRecyclying
We all wish to improve pretraining
If only we had unlimited compute and data...
Together we have!

We propose a way to recycle finetuning
and transform it into multitask learning!

arxiv.org/abs/2212.01378

@Shachar_Don @VenezianElad @colinraffel @noamslonim @YoavKatz73 me
How to perform multitasking, by simply uploading models?

Collaborative Descent (ColD) Fusion is simple:
Start from a pretrained model
Let contributors finetune on it, and share their models
Fuse the models to get a new better model
Take the improved model as the new best model
Read 9 tweets
Despite the amazing results I’ve experienced with ChatGPT, this is not a correct way to look at LLM vs. Google search. Since several other tweets have made this equivalence and have been eager to spell doom for Google, let’s examine the details:
1. Google has more LLMs deployed internally than any place I know. If private communication is to be believed that number is in the order of “few dozens”. Not talking of BERT/T5 sized models here.
2. Google also has more compute than anyone. The joke is only NSA probably has an estimate of Google’s compute. So they are not compute-limited to build as much big a model as they want.
Read 22 tweets
Really enjoyed giving a talk today on #fakenews #linguistics for the Fakespeak project at the University of Oslo today. A quick 🧵... A Taxonomy of Fake News for...
My basic argument is that there a lots of definitions of Fake News out there and a taxonomy of fake news is therefore very useful to make sense of this situation and to theoretically ground our research... But this means researchers ...
The taxonomy I propose is based on the concepts of veracity (true/false news) and honesty (honest/dishonest news), which I argue are independent concepts. Falsehoods are not necessar...
Read 12 tweets
Sentence embeddings (e.g., SBERT) are powerful -- but we just don't know what is crammed into a %&!$# vector 😵‍💫.

💥So in our new paper, we use Abstract Meaning Representation (AMR) to make sentence embeddings more explainable! #AACL2022 #nlproc #MachineLearning (1/3) Image
Interesting: Yes, we use AMR -- but we don't need an AMR parser🤯. Therefore, we don't lose efficiency 🚀. The accuracy 🎯 is also preserved, and sometimes even improved (for argument similarity, we achieve a new state-of-the-art). (2/3)
Read 4 tweets
We suffered through curating and analysing thousands of benchmarks -- to better understand the (mis)measurement of AI! 📏🤖🔬

We cover all of #NLProc and #ComputerVision.

Now live at @NatureComms! nature.com/articles/s4146…

1/ Image
Benchmarks are crucial to measuring and steering AI progress.

Their number has become astounding.

Each has unique patterns of activity, improvement and eventual stagnation/saturation. Together they form the intricate story of global progress in AI. 🌐
2/ Image
We found a sizable portion of benchmarks have kind of reached saturation ("can't get better than this") or stagnation ("could get better, but we don't know how / nobody tries"). But still a lot of dynamic benchmarks as well!
3/ Image
Read 9 tweets
Happy to release GOAL ⚽️, a multimodal dataset based on football highlights that includes 1) videos, 2) human transcriptions, and 3) Wikidata-based KB with statistics about players and teams for every match.

arxiv.org/abs/2211.04534

Interested? Read the thread below! #NLProc
[1/7] Previous video benchmarks consider movies or TV series that typically involve scripted interaction between characters instead of visually grounded language. On the other hand, in GOAL we focus on football commentaries because they involve visually grounded language Image
[2/7]: GOAL pushes the boundaries of current multimodal models because it requires the encoding of 1) videos; 2) commentary; 3) KB information. All these elements are essential when generating a sound and coherent commentary for a football video. Image
Read 8 tweets
Can instruction tuning improve zero and few-shot performance on dialogue tasks? We introduce InstructDial, a framework that consists of 48 dialogue tasks created from 59 openly available dialogue datasets
#EMNLP2022🚀
Paper 👉 arxiv.org/abs/2205.12673
Work done at @LTIatCMU
🧵👇 We investigate instruction ...
Instruction tuning involves fine-tuning a model on a collection of tasks specified through natural language instructions (T0, Flan models). We systematically studied instruction tuning for dialogue tasks and show it works a lot better than you might expect!
The InstructDial framework consists of 48 diverse dialogue tasks varying from classification, grounded and controlled generation, safety, QA, pretraining, summarization, NLI, and other miscellaneous tasks. All tasks are specified through instructions in a seq-2-seq format. In our taxonomy, Classifica...
Read 9 tweets
ITT: an OAI employee admits that the text-davinci API models are not from their papers.

Until @OpenAI actually documents the connection between the models in their papers and the models released via APIs, #NLProc researchers need to stop using them to do research.
@OpenAI This is not a minor point either. Apparently the text-davinci-002 API “is an instruct model. It doesn't uses a similar but slightly different [sic] training technique but it's not derived from davinci. Hence it's not a fair comparison.”
@OpenAI Note that the text-davinciplus-002 model that he mentions isn’t publicly available AFAIK. So external researchers trying to study the InstructGPT models not only are running the wrong models, but they can’t study the correct ones.

Read 9 tweets
This article in the Atlantic by Stephen Marche is so full of #AIhype it almost reads like a self-parody. So, for your entertainment/education in spotting #AIhype, I present a brief annotated reading:

theatlantic.com/technology/arc…

/1
Straight out of the gate, he's not just comparing "AI" to "miracles" but flat out calling it one and quoting Google & Tesla (ex-)execs making comparisons to "God" and "demons".

/2 Screencap from linked article: "Miracles can be perplex
This is not the writing of someone who actually knows what #NLProc is. If you use grammar checkers, autocorrect, online translation services, web search, autocaptions, a voice assistant, etc you use NLP technology in everyday life. But guess what? NLP isn't a subfield of "AI".
/3 Screencap, same article: "Early artificial intelligence
Read 25 tweets
Language Models have taken #NLProc by storm. Even if you don’t directly work in NLP, you have likely heard and possibly, used language models. But ever wonder who came up with the term “Language Model”? Recently I went on that quest, and I want to take you along with me. 🧶
I am teaching a graduate-level course on language models and transformers at @ucsc this Winter, and out of curiosity, I wanted to find out who coined the term “Language Model”.
First, I was a bit ashamed I did not know this fact after all these years in NLP. Surely, this should be in any NLP textbook, right? Wrong! I checked every NLP textbook I could get my hands on, and all of them define what an LM is, without giving any provenance to the term.
Read 25 tweets
I finally read @boazbaraktcs’s blog on DL vs Stats.
A great mind-clearing read! 👍
“Yes, that was how we thought about NNs losing out due to bias/variance in ~2000”
“Yes, pre-trained models really are different to classical stats, even if math is the same”
windowsontheory.org/2022/06/20/the…
A bit more nuance could be added to this 2nd para on Supervised Learning. Initial breakthroughs _were_ made in #NLProc via unsupervised learning prior to AlexNet—the word vectors of Collobert&Weston (2008/2011) like related stuff in RBMs, Google cat, etc. jmlr.org/papers/volume1… Image
But, for a few years, the siren song of the effectiveness of end-to-end deep learning on large supervised datasets was irresistible and very successful, probably partly because of how, in the over-parameterized regime, it does do representation learning as this post argues.
Read 3 tweets
📢 Today we officially launch TweetNLP, an all-round NLP platform for social media. From sentiment analysis to emoji prediction and more 🔥🔥

✔️ TweetNLP includes a Python API, a demo and tutorials. Useful for developers and researchers alike. Want to know more? 🧵 Image
Everything you need is at ➡️tweetnlp.org⬅️

TweetNLP is powered by relatively light-weight transformer-based language models, which can be run on most computers or free cloud services.

So, what’s in it for you? 👇
1⃣ <Python API>

Step 1: pip install tweetnlp

Step 2: import tweetnlp

Step 3: model = tweetnlp.load('sentiment')

Step 4: model.sentiment("We love NLP!🥰")

github.com/cardiffnlp/twe…
Read 9 tweets
Can your encoder-decoder model generate a database-like table? We intended to do it efficiently and ended up with the STable🐴framework applicable to problems such as extraction of line items or joint entity and relation extraction.

See arxiv.org/abs/2206.04045 and 🧵
#NLProc
From receipts and invoices, through paycheck stubs and insurance loss run reports, to scientific articles, real-world documents contain explicitly or implicitly tabular data to be extracted. These are not necessarily represented as a table per se within the input document.
At the same time, encoder-decoder models unify a variety of NLP problems by casting them as QA with a plain-text answer. We argue that the restriction of output type to raw text is sometimes suboptimal and propose a framework able to infer a list of ordered tuples or a table.
Read 8 tweets
Excited to share RankGen, a 1.2B param contrastive encoder mapping prefixes & model generations to vectors.

✅ large improvements over nucleus/typical sampling
✅ score & rank generations from any LM
✅ human eval with writers
✅ HuggingFace ckpts, code👇
arxiv.org/abs/2205.09726 Image
Despite great progress, text generation continues to underperform. Even large LMs generate text that has hallucination, poor continuity etc.

Part of the issue is LMs are trained to predict the next one token given the ground truth prefix, encouraging reliance on local context. Image
To tackle this we build RankGen, which map prefixes close to their gold continuation, but away from other continuations in the same document, as well as model-generations from a large LM.

We train RankGen using large-scale contrastive learning with minibatch size of 3K. Image
Read 8 tweets
Ever wondered what it takes to build an intelligent Q/A assistant? 🤔

@OpenAI #gpt3 and a few hours is all you need!

🤯 Yes, you heard it right!

🕹 Built a #javascript wizard using @streamlit to answer all your queries like a human expert!

A thread 🧵

#nlproc #AI #lowcode
@OpenAI @streamlit Gone are the days when you had to spend hours on @StackOverflow for resolving code-related queries!

🪄 Javascript wizard gives you precise answers to all your #JS related questions by leveraging #gpt3's latest code-davinci model that understands your queries just like humans!
@OpenAI @streamlit @StackOverflow I have made the application code #OpenSource so you can just clone the repo and build #gpt3 powered #AI applications for your usecase!

👨‍💻 GitHub Repo - github.com/Shubhamsaboo/j…

#NLP #lowcode #nlproc
Read 8 tweets
How to make your weekend productive? 🤔

🕹 Build an end-to-end neural search engine in #Python using @JinaAI_

🤯 Yes, you heard it right!

@JinaAI_'s DocArray library and a few hours is all you need to build a complete search solution!

A thread 🧵

#nlproc #OpenSource #AI
@JinaAI_ Neural search lets you improve search relevance by understanding the intent and going beyond the conventional keyword-based search!

Some common applications are:

👉 Q/A chatbots
👉 Voice assistants like #alexa #siri
👉 Recommendation system by @netflix
Check out this blog to build a semantic search engine for textual data.

It will walk you through the steps to build your first neural search application in no time!

#python #artificialintelligence #nlproc #OpenSource #datascience #lowcode

shubhamsaboo111.medium.com/build-neural-t…
Read 6 tweets
Annotation guidelines for ML 101 (basic concepts) 🧵

✨ The results of supervised learning approaches are only as good as the annotations they are based on.

✨ Annotations are only as good as the guidelines that annotators rely upon to direct their efforts.

⬇️
What is data annotation? 🪄

✨It is the task of associating entries in your data with additional information

✨It is also known as "coding" or "labeling"

✨It is crucial for both qualitative and quantitative analyses

⬇️
Supervised learning basics 🤖

✨ In AI, data annotation is the bases for the supervised learning approach

✨ Supervised learning uses annotated datasets or collections of data points with associated labels

✨ The model learns to predict labels from these annotated datasets

⬇️
Read 8 tweets
Over the past several months, I’ve been doing a deep-dive into transformer language models #NLProc and their applications in #psychology for the last part of my #PhD thesis. 👩🏻‍🎓 Here are a few resources that I’ve found invaluable and really launched me forward on this project 🧵:
🏃🏻‍♀️💨 If you already have some data and you want a jump start to classify it all using #transformers, check out @maria_antoniak’s #BERT for Humanists/Computational Social Scientists Talk: Colab notebook and other info: bertforhumanists.org/tutorials/#cla…
🤓 I’m an absolute nerd for stats and experiment design in psych, but doing #ML experiments is very different. A clear, detailed (and reasonable length) course to get up to speed on train-test splits, hyperparameter tuning, etc. and do the very best work: coursera.org/learn/deep-neu…
Read 7 tweets
#NLProc folks: Beam search with controlled patience improves text generation.
Just change a 𝐬𝐢𝐧𝐠𝐥𝐞 𝐥𝐢𝐧𝐞 in your codebase!
Paper: arxiv.org/abs/2204.05424
Code: github.com/jungokasai/bea…
1/4
The widely-used beam implementation uses a first come, first served (FCFS) heuristic. Keep completed sentences (set F) and stop when F reaches the beam size. We introduce a patience factor that generalizes this stopping criterion and adds flexibility to the search depth. 2/4
The patience factor improves decoding performance of strong pretrained models on news text summarization and machine translation over diverse language pairs, with a negligible inference slowdown. The performance gain is consistent over varying beam sizes. 3/4
Read 4 tweets
I'm very excited and proud to announce that my team at Meta AI, with our collaborators, will have a strong presence in NAACL 2022 with 8 accepted papers on summarization, question answering and retrieval technologies.#nlproc #ai #NAACL2022 (see papers in follow-up tweets)
[Summarization and QA] Simple Local Attentions Remain Competitive for Long-Context Tasks: lnkd.in/gEPz2Ytz
[QA and Retrieval] CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training: lnkd.in/gbBV8_Rd
Read 9 tweets
About generalization of different networks

Main finding: Generalization in pretraining follows a single dimension
Different networks, architectures, seeds, sizes but:
Similar performance → similar linguistic capabilities

@aclmeeting accepted (#NLProc)

Summary & story 🧵
It all began in a discussion of
C. Zhang, S. Bengio, @mrtz, @beenwrekt, @OriolVinyalsML
Fascinating work.
arxiv.org/abs/1611.03530

More about their work
We wondered:
Why do networks learn but hardly overfit?
Why overfitting training doesn't hurt test?

It means VC dimension is a really bad way to think about learning
Read 27 tweets
During training, your loss goes up and down up and down up and down.

But how would it go if you magically went in a straight line
from init to learnt position?

Apparently smoothly down!

On the surprising Linear Interpolation:
#scientivism #deepRead #MachineLearning
It all started on ICLR2015(!)
@goodfellow_ian @OriolVinyalsML @SaxeLab
Checked points between the converged model and the random initialization.
They found that the loss between them is monotonically decreasing.
Why shouldn't it?
Well... The real question is why should it.

If the loss terrain is anything but a slope, we would expect bumps. Maybe there are different sinks (local minima), or you need to get a bad model before you reach the best model (topologically, you are in a ditch)
Read 22 tweets
I have just found a new phenomenon:
Linear mode connectivity

What is the loss of the mid-model?
A model somewhere between converged models with different seeds?

#MachineLearning Image
Take two models, put them in the loss space
The points between them are the mode connectivity.

If the models converge into different solutions\loss pits (blue), then there is a barrier between them, called "energy barrier" (yellow). Image
Apparently, there is 𝘴𝘰𝘮𝘦 path (connectivity) where the loss stays almost the same. It is also relatively simple.
arxiv.org/pdf/1802.10026…
@tim_garipov @Pavel_Izmailov @FullStackML @andrewgwils
arxiv.org/pdf/1803.00885…
@FelixDrRelax @FredHamprecht Image
Read 15 tweets
📣UnifiedSKG: Lots of #NLProc researchers separately study tasks that link text to structured knowledge (Table/DB/KB..). We unify 21 such tasks into a Seq2Seq format with T5 to foster idea sharing&multitasking, performing very competitive!

Paper&Code: github.com/hkunlp/unified… 👇
Structured Knowledge Grounding (SKG) tasks were studied by different communities, leading to divergent architectures and implementations. Unification decreases barriers for newcomers and encourages methods that generalize across tasks. UnifiedSKG unifies 21 tasks into Seq2Seq.
We benchmark all tasks in UnifiedSKG using T5 with very little task-specific modification. To our surprise, it achieves SOTA on almost all tasks! Larger models are better, and we expect the trend to continue.
Read 8 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!