Remember these three... that's about all I could take away from all these WORDS
It was a bit more than I was ready to try and digest.
๐๐ณ๏ธ๐๐งต
@AtmanAcademy "Explain this like I'm 15"... not like I'm 5, but not like I'm a college graduate either.
THAT's more like it - I can actually follow these!
(Can you?)
Again - say it with me:
๐ณ๐จ๐๐ฌ๐น๐บ
๐ฏ๐ฌ๐จ๐ซ๐บ
๐ป๐ถ๐ฒ๐ฌ๐ต๐บ
๐๐ณ๏ธ๐๐งต
@AtmanAcademy So it's all starting to come together (or come apart?):
Larger Layers,
More Attention Heads and a
Bigger Context Window (more input tokens)
Give the newer models superior performance.
๐๐ณ๏ธ๐๐งต
@AtmanAcademy That's probably enough to chew on for tonight.
To (try) and summarise:
Layers - enable context abstraction
Heads - gives the "weighting" (or attention) to the words/tokens
Tokens - broken up words and sub-words for processing