Copilot, from @github and @OpenAI, feels quite magical to use. It can auto-generate multiline functions, tests, and more, using all the context of your current code file. But part of me wonders: are the downsides are too great in practice? 1/🧵…
OpenAI warns:
“As with other large language models trained on a next-token prediction objective, Codex will generate code that is as similar as possible to its training distribution. One consequence of this is that such models may do things that are unhelpful for the user”
The fact that Copilot (and Codex) writes reasonable-looking code is an amazing achievement. From a machine learning and language synthesis research point of view, it’s a big step forward...
...But we also need to be clear that reasonable-looking code that doesn’t work, doesn’t check edge cases, uses obsolete methods, and is verbose and creates technical debt, can be a big problem.
Most time coding is not taken up in writing code, but with designing, debugging, and maintaining code. When code is automatically generated, it’s easy to end up with a lot more of it.

Especially since Copilot code tends to be verbose.
Copilot managed to auto-generate nearly all of the 89 lines of code needed to finetune a @PyTorch model.

But the code it creates finetunes the model really badly. Is this actually useful to anyone?…
When we’re typing into vscode, Copilot jumps in and suggests code completions entirely automatically and without any interaction on our part. That often means that before we’ve really had a chance to think about where we’re heading.

That can be a problem, due to anchoring bias.
Generally if a programmer doesn’t know how to do something, and isn’t using Copilot, they’ll Google it. A couple of minutes of Googling can result in learning far more about the problem and the possible space of solutions.
In addition to CoPilot, @Microsoft, the owners of GitHub, have created a different but related product called “API Usage Examples”. It shows examples of API use with links to their source.

Sometimes, this might be what you really need.…
I still don’t know the answer to the question: “Is GitHub Copilot a blessing, or a curse?” It could be a blessing to some, and a curse to others. For those for whom it’s a curse, they may not find that out for years.

To learn why, read the article:…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Jeremy Howard

Jeremy Howard Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jeremyphoward

16 Jul
Today we're announcing fastchan, a new conda mini-distribution with a focus on the PyTorch ecosystem. Using fastchan, installation and updates of libraries such as @PyTorch, @huggingface and @RAPIDSai is faster, easier, and more reliable…
We describe fastchan as a "mini-distribution" since it's designed to work together with @anacondainc, rather than standalone. As well as GPU libraries, it includes useful packages like headless @opencvlibrary and @wightmanr's timm image models.
For a deep dive into how fastchan helps you manage your Python library, have a look at this fantastic blog post by @amaarora…
Read 6 tweets
14 Jul
For 20 years I used a wide variety of machine learning and optimization algorithms to tackle predictive modeling challenges.

Nowadays, #deeplearning is part of nearly everything I do. In this @MelbCtrDataSci talk, I explain why I made this my focus.
Early on, I mainly just used GLMs with lots of feature engineering for predictive modeling. Just like Andy Beck and team did when they developed Cpath, the computational pathologist
Once I discovered the magic of restricted cubic splines, they became my main tool for handling continuous independent variables
Read 12 tweets
26 May
Here's something really help from @amaarora - a thorough walk-thru of my recent talk "20 Years of Tech Startup Experiences in One Hour".

It's a great way to get the key messages of the talk, without watching the whole thing.…
My full talk is available here:
This approach, of writing up a talk that you like and posting it publicly, is great for everyone:
- The speaker gets a sharable text version of their talk
- The writer gets the appreciation of the speaker and the audience
- People can save time by reading instead of listening
Read 7 tweets
26 May
I've just uploaded a new "lesson 0" for Practical Deep Learning for Coders, which is an optional lesson of tips for how to get the most out of the course
The lesson contains a lot of insights from the new book from @radekosmulski, "Meta Learning", which describes his successful journey from non-coder to Kaggle winner and full-time deep learning scientist…
It finishes with a complete walk-thru showing how to set up a Linux GPU server from scratch on @awscloud EC2. For details on the course, and setup on many different cloud environments, see:
Read 4 tweets
4 Feb
I’m hearing comments that Grid AI (Lightning) seem to have copied fastai's API without credit, and claimed to have invented it.

We wrote a paper about our design; it's great it's inspiring others.

Claiming credit for other's work? NOT great 1/…
PyTorch Lightning is a new deep learning library, released in mid-2019. The same team launched "Flash", a higher level library, this week.

fastai was launched in 2017, based on extensive research.

As you can see, they look *very* similar.
The quote below is from the Flash launch post (h/t @tomcocobrico). It is very clearly not true.

fastai's focus has always been simple inference, fine-tuning, and customization of state of the art models on new data in a matter of minutes.

Read 20 tweets
11 Jan
Our paper, "An evidence review of face masks against COVID-19", written by a cross-disciplinary team of 19 international experts, was published in the Proceedings of the National Academy of Sciences today.

No time for the paper? Then read this thread!…
The paper, which includes 141 references (yes, I read every one of them!) argues that we should increase focus on a previously overlooked aspect of mask usage: mask wearing by infectious people ("source control"), rather than only mask wearing by susceptible people ("PPE") Image
Masks have been used to help control respiratory pandemics for at least 600 years. Wu Lien-Teh (the "Plague Fighter") showed the world the importance of masks nearly 100 years ago, doing detailed studies over many years.

Sadly, his work became largely forgotten in the west Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!