Graph neural networks are driving lots of progress in machine learning by extending deep learning approaches to complex graph data and applications.
Let’s take a look at a few methods ↓
1) A Graph Convolutional Network, or GCN, is an approach for semi-supervised learning on graph-structured data. It’s based on an efficient variant of CNNs which operates directly on graphs and is useful for semi-supervised node classification.
2) Diffusion-convolutional neural networks (DCNN) introduce a diffusion-convolution operation to extend CNNs to graph data. This enables learning of diffusion-based representations. It's used as an effective basis for node classification.
3) Graph Attention Network (GAT) is a graph neural network that leverages masked self-attentional layers. Hidden representations of nodes are computed by attending to neighbours using self-attention. It achieves SOTA results on node classification.
4) GraphSAGE is a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate useful node embeddings for previously unseen data.
5) The Graph Transformer is a generalization of transformer neural networks for arbitrary graphs. The architecture introduces new properties that leverage the graph connectivity inductive bias to perform well on problems where graph topology is important.
And finally but not least... here is an extended list of graph neural networks and their associated papers, benchmark datasets, trends, and open source codes.
We release our initial paper below. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. Includes scientific text and also scientific modalities such as proteins, compounds and more.
Here is a thread to catchup on the top 10 trending papers of August on @paperswithcode.
1) An Image is Worth One Word - a new approach that allows for more creative freedom with image generation; proposes "textual inversions" to find pseudo-words that compose new sentences that guide personalized creations.
2) Cold Diffusion - proposes diffusion models built around arbitrary image transformations without Gaussian noise; discusses the potential for generalized diffusion models that invert arbitrary processes.
Check out these trending papers to catchup on the latest developments in language models. ↓
1) N-Grammer (Roy et al.) - takes inspiration from statistical language modeling and augments Transformers with latent n-grams; it matches strong baseline models like Transformer and Primer while being faster in inference.
2) Language Models (Mostly) Know What They Know (Kadavath et al.) - investigates whether an LM can be trained to perform well at predicting which questions it will be able to answer correctly; this enables self-evaluation on open-ended sampling tasks.
Here is a thread to catchup on the top 10 trending papers of June on @paperswithcode. ↓
1️⃣ Mask DINO (Li et al) - extends DINO (DETR with Improved Denoising Anchor Boxes) with a mask prediction branch to support image segmentations tasks (instance, panoptic, and semantic).
2️⃣ Hopular (Schäfl et al) - proposes a deep learning architecture based on continuous Hopfield networks for competitive results on small-sized tabular datasets.
Here is a thread to catchup on the top 10 trending papers of May on @paperswithcode. 1/11
1⃣ OPT (Zhang et al) - release open pre-trained transformer language models ranging from 125M to 175B parameters. The release include: logbook detailing infrastructure challenges and code to experiment with the released models. 2/11
2⃣ CoCa (Yu et al) - a new foundation model that achieves new state-of-the-art on ImageNet (90.6%); proposes minimal strategy to jointly pre-train an image-text encoder decoder with contrastive loss and captioning loss. 3/11
In this thread, we summarize ten recent trends and insights in language models. ↓
1) Scaling Laws
Kaplan et al. report that language models (LMs) performance improves smoothly when increasing model size, dataset size, and compute. Recent works provide empirical evidence that LMs are underexplored and can be improved in other ways.
Hoffmann et al find that large LMs are undertrained and that for a compute-optimal model, Chinchilla, model size & number of training tokens should be scaled equally. Chinchilla (70B) outperforms Gopher (280B) on several tasks.