In our lab’s newest preprint (arxiv.org/abs/2106.05426), we used transfer learning between 100 different language representations to show that the SPACE OF REPRESENTATIONAL SPACES seems to be fundamentally low-dimensional.
To analyze language, it’s common to use representations that express different types of information. These representations are often grouped into categories—like “syntactic” or “semantic”—implying there is low-dimensional structure in the space of language representations.
We compared word embeddings, hand-designed syntactic and semantic features, LM-derived contextual embeddings, and contextual embeddings from MT models. Across these 100 representations, the primary factor seems to be something like “abstraction”.
We also used the 100 repr’s to build encoding models for natural language fMRI data. This let us test if the low-dim structure is recapitulated in the brain (it is). It also let us visualize the primary dimension on the cortex, where it again seems to reflect “abstraction”.
Our goal here was to begin mapping and surveying the space of language representations. We hope that better understanding and intuition for this space will help to disentangle the language representations that our brains actually use for comprehension. Comments are welcome!
• • •
Missing some Tweet in this thread? You can try to
force a refresh