Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Martin Görner

@martin_gorner

Sep 13, 2018 • 7 tweets • 3 min read • Read on X

Scrolly

Google Cloud Platform now has preconfigured deep learning images with Tensorflow, PyTorch, Jupyter, Cuda and CuDNN already installed. It took me some time to figure out how to start Jupyter on such an instance. Turns out it's a one liner:

Detailed instructions:
1) Go to cloud.google.com/console and create an instance (pick the Tensorflow deep learning image and a powerful GPU)

2) Ssh into your instance using the "gcloud compute ssh" command in the pic (there will be additional install prompts to accept and a reboot on the first connection. Relaunch the command after that to reconnect). Replace PROJECT_NAME and INSTANCE_NAME with your own values.

3) You are now SSH'ed into your instance. Type "jupyter notebook". Jupyter starts and gives you a URL. Copy-paste it into your browser. That's it. The -L param in the SSH command sets up ssh tunnelling from localhost:8888 on your laptop to localhost:8888 on your instance.

Once again in copy-paste friendly text:
gcloud compute --project "PROJECT_NAME" ssh "INSTANCE_NAME" -- -L 8888:localhost:8888

Oh, and Jupyter lab is already running on port 8080 whenever a deep learning instance boots. You don't even need to start it. If you are into Jupyter Lab, start the instance and ssh right in:
gcloud compute ssh "INSTANCE_NAME" -- -L 8080:localhost:8080

It also works with multiple ports. You want jupyter notebooks (8888) and tensorboard (6006) ? No problem:

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @martin_gorner

Martin Görner

@martin_gorner

Jan 16, 2024

The "Self-Extend" paper promises magic for your LLMs: extending the context window beyond what they were trained on. You can take an LLM trained on 2000 token sequences, feed it 5000 tokens and expect it to work. Thread 🧵
(SWA below=sliding window attn.) arxiv.org/abs/2401.01325

To be fair, some LLMs can already do that, if they are trained with a specific positional encoding like Alibi (). And before LLMs, Recurrent Neural Networks (RNNs) could do this trick as well. But was lost in Transformers.arxiv.org/abs/2108.12409

So how does it work? It turns out that if you understand the self-attention mechanism, a bit of high-school math goes a long way. Here is how self-attention is computed in Transformers:

Read 19 tweets

Martin Görner

@martin_gorner

Dec 29, 2022

Large Language Models are getting good at formal logic:
arxiv.org/abs/2212.13894 LAMBADA: Backward Chaining for Automated Reasoning.

This paper is, in part, a traditional algorithm, a "depth-first search algorithm over the facts and the rules", starting from the desired conclusion and trying to logically reach the premises (facts and rules).

And in part a sophisticated LLM prompting technique since the "Fact Check", "Decompose" and "Rule Selection" modules are specific prompts agains a Large Language Model, PaLM 540B in this case.

Read 11 tweets

Martin Görner

@martin_gorner

Dec 8, 2022

How can you probe what a language model knows ? If you ask it directly, it might lie (for example because you prefixed your question with untruths, or many other reasons).
Contrast-Consistent Search (CCS) gives a way:
openreview.net/pdf?id=ETKGuby…

It takes advantage of a nice property of True/False statements: they cannot be True and False at the same time.
Take any statement "The Eiffel tower is a crab", add True/False to the end and you have two mutually exclusive statents.

You can then feed these into a language model and train a small neural network to classify them as True/False from just some of the internal activations of your model. The language model stays frozen.

Read 14 tweets

Martin Görner

@martin_gorner

Dec 7, 2022

@luke_wood_ml

Here is @luke_wood_ml
explaining Stable Diffusion at #Devoxx

Stable Diffusion is the first end-to-end model in the new KerasCV library.

@luke_wood_ml

@luke_wood_ml The code to generate an image:
Here is also, the presentation:
lukewood.github.io/devoxx/
and a Colab notebook to try it out: colab.research.google.com/github/lukewoo…
#KerasCV

@luke_wood_ml

@luke_wood_ml A generated pic just for fun: "Photorealistic representation of a muscular barbarian wearing bunny ears wielding the sword Excalibur"

Read 4 tweets

Martin Görner

@martin_gorner

Dec 5, 2022

@geoffreyhinton

Thought-provocative new paper from @geoffreyhinton: what if we could replace backpropagation with something better?

@geoffreyhinton

@geoffreyhinton I seems very unlikely that the human brain uses back propagation to learn. There is little evidence of backprop mechanics in biological brains (no error derivatives propagating backwards, no storage of neuron activities to use in a packprop pass, ...).

@geoffreyhinton

@geoffreyhinton Also, the brain can learn from a continuous stream of incoming data and does not need to stop to run a backprop pass. Yes, sleep is beneficial for learning somehow, but we can learn awake too.

Read 14 tweets

Martin Görner

@martin_gorner

Nov 9, 2022

https://twitter.com/yixuan_su/status/1590034008758312960

Contrastive Search is the new kid on the block for text generation from language models. Better than greedy or beam search, top-k, nucleus sampling etc

Can continue text from a prefix with quality indistinguishable from a human, as judged by humans
paper: arxiv.org/abs/2210.14140

https://twitter.com/yixuan_su/status/1590034008758312960

In the experiment results above, the model continues a given text and human raters evaluate the result.
The raters preferred text generated by contrastive search 60-70% of the time (green box). When comparing to human output, they were undecided (red ellipses).

Intuitively, contrastive search encourages the model to generate the most likely sequence of words without repeating itself. The decoding maximized likelihood and minimizes the cosine similarity with already generated tokens ("degeneration penalty").

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Martin Görner

Try unrolling a thread yourself!

More from @martin_gorner

Martin Görner

Martin Görner

Martin Görner

Martin Görner

Martin Görner

Martin Görner

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!