How do language models (like BERT or GPT) "see" words?
TLDR: whereas we see ππΜπππΜππΜ ππΜ πππΜ π€ ππΜππΜππΜπ£πΜππ, language models see [π·0π·, πΌπ·πΌ0, πΈ000, π·πΏπΏπΌ, π·00, π·πΏπΈ0πΊ, π·π½πΌπΈπΏ, πΈ0π·π», π·0πΈ]
𧡠on Tokenization by examples
1/