yes, the neural LMs learn a model of the world as projected through documents (and soon also images) found on the internet. the big question is to what extent this projection---even in the best case---provides an accurate or complete representation of the real world.
here the @emilymbender camp says "nothing whatsoever", and i and others say "to some extent, and certainly enough to be useful", but i don't think there is any doubt that its very far from being complete or accurate.
@OmerAlali7 (ודווקא ייצוג וקטורי הוא לדעתי מחליש את הטיעון, ולא מחזק. כי סתם מרחב וקטורי והגאומטריה שלו זה ייצוג מאד חלש של ידע. אני חושב שבמודלים המלאים יש הרבה יותר מזה.)
for me there are two, both from my CS undergrad, both were the first assignment in a course, both involve programming, and both beautifully captured the essence of the course in a single, simple to explain and somewhat open-ended assignment, that you had to figure out on your own
the first is in the Compilation course by Prof. Mayer Goldberg (no family relation): we had to write an interpreter for a simple-but-turing-complete language (easy), and then we had to write a compiler from high level code to this language.
the second is in an NLP/NLG course by Prof @melhadad (who then became my msc and phd adviser), where the assignment was basically "Write a program where the input is a sequence of numbers, such as 1,2,3,7,9,10 and the output is a sucint description of the sequence in English".
חלקכם אולי הבחנתם לפני כחודש ש ynet הוסיפו אופציה להקראה של כתבות בעברית, וזה אפילו עבד ממש לא רע.
ההקראה התבססה במידה רבה על טכנולוגיית הנקדן שפותחה בדיקטה, בפרוייקט מרשים מאד בהובלת אבי שמידמן ופיתוח עיקרי על ידי אבי ושאלתיאל שמידמן (ומעורבות מסויימת שלי, ומעורבות של משה קופל)
אבל למה אני מספר לכם על זה? האם כדי לספר על האתגרים בטקסטים חסרי ניקוד, ולהתגאות בהישג היפה שלנו? גם, אבל לא העיקר.
העיקר הוא הסיפור היפה הזה:
החברה שסיפקה את פתרון ההקראה של וואינט פשוט השתמשה ב-api של הנקדן בלי לבקש ובלי לדבר ושילבו במוצר מרכזי שלהם, ואפילו לא שלחו אימייל כדי לנסות להגיע להסכם מסודר יותר. מדהים.
The task definition is very simple: for every pair of base-NPs in the text (in our dataset a text is ~3 paragraphs long), decide if they can be related by a preposition, and if so, which.
Why is this task interesting? We argue that its a core component of reading comprehension.
When reading text, we identify noun-phrases (NPs), and integreate each new one in a network of NPs, which we maintain.
This network is essential for "understanding".
One famous edge type is "coreference": indicating two NPs refer to the same entity.
but, it is also the really bare minimum of an eval. and it is far from being a good one (for starters, hardly any details are given re eval guidelines, what were the evaluators were instructed to eval). this is sort-of excusable here, since models are so far from human level,
but this most def won't be a good eval when trying to claim good performance. there has been a lot of prev work in the summarization community on how to properly eval (albeit not for this task of book-length summarization). i rec looking at what they did. start with Pyramid.
a bit more on this: "oh the new large DL models in NLP are so soul-less, they only consider form and don't truly understand meaning, they are black-boxes, they expose and amplify sociatel biases in the data, etc etc etc":
well, all true, but at least they work. like, previous-gen models *also* didn't understand meaning, and *also* considered only form. they were just much worse at this. so much worse that no one could ever imagine that they capture any kind of meaning whatsoever. they didn't work.
(not that the current ones "work". but they do "work" much better than the previous gen models. much, much better.)