Tweet

AK

Follow @ak92501

11 Oct, 4 tweets, 2 min read

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B
blog: microsoft.com/en-us/research…

selected the subset of datasets (the top 11 rows in Figure 2, below) from The Pile that we found to be of the highest relative quality. Then, following a similar approach as that used to generate Pile-CC, we downloaded and filtered two recent Common Crawl (CC) snapshots.

based evaluation setting on the open-source project lm-evaluation-harness and made task-specific changes as appropriate to align settings more closely with prior work. evaluated MT-NLG in zero-, one-, and few-shot settings without performing search for the optimal number of shots

observed that the model can infer basic mathematical operations from context (sample 1), even when the symbols are badly obfuscated (sample 2). While far from claiming numeracy, the model seems to go beyond only memorization for arithmetic

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ak92501

AK

@ak92501

11 Oct

stylegan3 is out
github: github.com/NVlabs/stylega…

Read 5 tweets

AK

@ak92501

3 Jul

VQGAN + CLIP "matte painting of a city built on top of a giant turtle walking slowly towards the viewer with clear blue skies and a lush green landscape | trending on artstation" + 3D photo inpainting

Read 6 tweets

AK

@ak92501

24 Jun

Fine-Tuning StyleGAN2 For Cartoon Face Generation
pdf: arxiv.org/pdf/2106.12445…
abs: arxiv.org/abs/2106.12445
github: github.com/happy-jihye/Ca…

Read 4 tweets

AK

@ak92501

23 Jun

Alias-Free GAN
pdf: nvlabs-fi-cdn.nvidia.com/alias-free-gan…
project page: nvlabs.github.io/alias-free-gan/

networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and
they are fully equivariant to translation and rotation even at subpixel scales

Read 9 tweets

AK

@ak92501

8 Jun

Hierarchical Video Generation for Complex Data
pdf: arxiv.org/pdf/2106.02719…
abs: arxiv.org/abs/2106.02719

model generates a low resolution video, establishing the
global scene structure, that is then refined by subsequent levels in the hierarchy