4/ As opposed to directional models, which read text input sequentially (left-to-right or right-to-left), the Transformer encoder reads the entire sequence of words at once
Therefore it is considered bidirectional, though it may be said that its non-directional
6/ In the case of Machine Translation, like English to French translation, the Encoder will process English sentences as input, apply the attention mechanism, and encode it in the intermediate representation.
BERT leverages the Encoder architecture of the transformer. BERT takes text sequence as input. The BERT Encoder produces BERT embedding, which can be used to perform downstream tasks like Text classification or Named Entity Recognition.
10/ 👉 Decoder Only
GPT models leverage the decoder architecture of the transformer. Given the input sequence as prompt, GPT starts generating the response. Therefore, GPT models are best suitable for text or sequence generation.
3/ If the p-value from the test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that the time series is stationary.
2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.
"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.