BART model using @TensorFlow Keras (@fchollet) from scratch in less than 100 lines.
Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT)
The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
NLP Benchmark Tasks: BART performs comparably to RoBERTa and XLNet.