vakibs Profile picture
18 Oct, 47 tweets, 11 min read
In this thread, I will discuss the amazing power of Sanskrit Samāsas (compound nouns) and why they offer the ideal terminology for discussing complex ideas. All Indian languages borrow Sanskrit Samāsas, but this expressive power needs to be rekindled afresh for the modern world.
The picture that you see above is a "Parse Tree", which shows the conceptual dependencies between the different constituents of a word. The word's meaning is derived from the individual constituents, as well as from the relationship between them, given by the shape of the tree.
There are many types of parse trees, based on various theories of linguistics. The most common representation in European languages like English, which have a relatively fixed word order, is called a phrase structure tree. This is the tree for an example sentence in English.
In the above example, "the cat which is lying on the mat" is a complex concept. There is no single word for that in English. So you express it through a relative clause (here, an adjective clause "which is lying on the mat").

In Sanskrit, you can create a single word for it. 😀
Here is another type of a parse tree, called a "dependency tree", used for free word order languages, like in India. Instead of following the order of words in the sentence, we depict the relationship in meaning between different words. The head-node is elaborated by child-nodes.
Dependency grammars are a fantastic tool in AI. Why? When we grasp the conceptual relationships between the words, we can apply that to many tasks: language translation, question answering, information extraction etc. Here is an example from Google's dependency parser software.
In the above example, "Alice,who had been reading about Syntaxnet" is a complex concept. You can see it has a full sentence within. In natural human languages, sentences can be embedded within other sentences using such tricks, generating parse-trees that can go many levels deep.
Now, here is the interesting thing. In Sanskrit, if needed, the relative clause "who had been reading about Syntaxnet" can be compressed into a single word.

A full dependency tree is snapped into a word.

Why would you want to do that? 😀

I will explain that in the following.
But we can already see that something strange is going on with Sanskrit. When we are able to produce a single word equivalent to a complex sentence, we can see there are uncountable number of words in Sanskrit. So any dictionary for Sanskrit, unlike English, would be inadequate.
Other languages possess such capacity to a smaller degree. For example, if you are reading a German text and encounter a long word, you can try looking it up in a dictionary, but you will often not find it. In order to understand the word, you must break it at the right places.
In German, the compound is often a noun preceded by nouns or adjectives. If you try to build a dependency tree for these relationships, you will usually get a chain, each link qualifying the one that comes after it. Here is an example, "Fingerspitzengefühl" (finger-tip-feeling).
In contrast, the power of Sanskrit grammar is that you can have a very complex dependency tree for a single word, snapping almost entire sentences into one word.

Even a fairly complex algebraic expression can be compressed into a single word, like here.

Before we see how such a thing is even possible, consider the title of any famous work of Sanskrit literature. Almost always a single word.

"Rāmāyana" (The journey of Rāma)
"Kirātārjunīyam" (The story of the hunter and Arjuna)
"Ābhijñānaśākuntalam" (The remembrance of Śakuntala)
The most powerful dependency grammar that exists for any human language, even today, is one of the oldest: the Ashṭādhyāyi of Pāṇini. This Sanskrit grammatical tradition is the reason behind its extraordinary power of word formation.

All Indian languages borrow that power.
For a treatise on the gigantic creative power of a spoken human language, Ashṭādhyāyi is also extraordinarily short. Just 4000 odd Sūtras (computational rules) generate the entire language.

And the vast majority of these rules are about *word-formation*, not sentence formation.
The creative power of Sanskrit comes not just from Pānini’s work, but the millennia old grammatical tradition preceding it. It is already incipient in Vedic verses, arguably the oldest world literature. The Vedic chants are the reason why Indian languages have a free word order.
In order to memorize the verses precisely, the verses are sung in varying permutations, such as Ghanāpātha. The verse’s meaning should not change as the words are being shuffled around. This is ensured by case infections, which produce a free word order.

Sanskrit is a highly inflected language with seven cases (eight, when we count the Sambōdhanā vāchaka: salutation). Word morphology changes based on the case, gender, tense etc. This helps in disambiguating the parse tree, since the qualifier and the qualified must agree in case.
In the dependency tree, the head of any full sentence is a verb. All other words must explain their relationship to the verb. The 7 cases which specify the so-called “lexical relationship” are known in Sanskrit as Vibhakti. The actual semantic role they perform is called Kāraka.
The 7 cases of Sanskrit are preserved in Slavic languages. So in these languages, it is possible to shuffle the words around without losing meaning. German has only 4 cases. So shuffling is possible to a limited degree. English has very little case, so it has a fixed word order.
We should ask why there are just 7 cases in Sanskrit? My theory is that this is because of our human brain capacity in the working memory, where we can hold roughly 7 items at any time without forgetting or confusing between them. A larger working memory comes only by training.
So when we have a sentence in a human language, at the level of meaning, it should give roughly 7 concepts, explain the relationship between them and close the package. Then the brain processes this and stores it in the memory, ready to now accept a new sentence for processing.
The genius of Sanskrit comes from assigning “Kāraka roles” to all the seven cases: Karta (Doer/Agent), Karma (Object/Theme) etc. These elevate the syntactic analysis from sentence structure to meaning-structure. Then the whole tree can be made into a single word, if wanted.
The way it is done is through the so called Tatpurusha Samāsa. Certain words are grouped as having affinities (Ākānksha) to a specific Kāraka role, and when a word compound occurs with this word, we know how to imagine the equivalent dependency tree, even if the case is missing!
For example, the 2nd Tatpurusha Samāsa is made by words like Śrita (sought-er), Atīta (freed-er), Gata (going towards-er), Prāpta (obtained-er) etc.

There are many resources online to learn about Samāsas. I found these videos to be helpful (in Hindi).
Similarly, the 3rd Tatpurusha Samāsa is made of words like Samam (equal), Ūnam (lack), Miśram (mix) etc.

Although languages like English have a few such words, Sanskrit has a giant compendium of such useful words. There is a rich tradition of making such compounds on the fly.
Tatpurusha Samāsa is one of the many possible Samāsa (compounds). There is the Karmadhāraya Samāsa (qualifier -> noun) which occurs also in other languages. This has the tendency to produce chain-like trees. But its power is brilliantly enhanced in conjunction with other Samāsas.
Perhaps, the most delightful of Sanskrit Samāsas is the Bahuvrīhi, which is an exocentric compound that means something apart from both the words in the compound. It appears rarely in other languages like English: e.g, a "sabretooth" is a tiger. But Sanskrit has loads of them.
But the critical power of Sanskrit comes from the so-called Avyayībhāva Samāsa. It is comparable to the Greek and Latin prefixes and suffixes (para-, syn-, peri-, -cule), which are the workhorses of scientific terminology in English. Sanskrit has a giant treasure of such Avyayās.
Here is a nice introduction to this Samāsa, from the online resource mentioned earlier.

Examples of Avyayās: Adhi- (in), Upa- (near), Nira- (not present), Anu- (suitable) and so on. These should be mastered by all scientists writing in Indian languages.
If Sanskrit didn't have anything except Avyayībhāva Samāsa, it would already be a superior language to Greek/Latin in creating scientific terminology. However, in combination with the other Samāsas, it is simply matchless. It can condense any complex concept into a single word.
Finally, we have Dvandva Samāsa that replaces the conjunction in a sentence (e.g and). It can transform a list of things into a single word, putting them under an exocentric node. This completes the arsenal of tools available to transform any dependency tree into a single word.
An expert speaker in Sanskrit would know how exactly to use a Dvandva Samāsa to condense a parse tree unambiguously into a single word.

In the example I referred to earlier, "YōgaViyōga" is the Dvandva Compound embedded within to mean "sum & difference".
There are natural pairs/lists of things in the world. For example, mother and father, wife and husband, body parts, seasons, dance forms and so on. When we use these words together in a compound, it will naturally lead to a Dvandva compound. The speakers will know automatically.
When we strip all the grammatical markers and smash the words together into a single compound word, we are naturally increasing the ambiguity. The total number of possible dependency trees that can be obtained from a Samāsa of "n" parts is a Catalan number, exponential in "n".
But the beauty of language is that speakers will use their language awareness (statistical frequency of words, poetic style etc.) to unambiguously pick the right dependency tree. This made Indian poets delight in a literary style where they created extraordinarily long words.
The longest word ever recorded in world literature is in a Telugu poem "Varadāmbikā Pariṇayam" by the queen Tirumalāmba. It is a complete Sanskrit compound with 195 Sanskrit syllables. We can see the English translation and try to imagine its complex dependency tree. 😄
It is only recently that western linguists understood the extraordinary strength of Sanskrit compounds. In this 2015 paper, the Oxford linguist John Lowe argued why Sanskrit compounds must be considered "syntactic structures" and not "lexical structures".…
Lowe says that a word is processed as an "anaphoric island". It means the other words in a passage can refer to the word as a whole, but not to individual constituents. But not so for Sanskrit compounds.😀

Words with prefixes like "tad-", "svīya-" etc. can peek into other words.
"tad-" means "corresponding to that thing".
"svīya-" means "mine"

Such when such pronoun-related prefixes appear in a word, they help comment on what is happening within the dependency tree of a compound. Here is an example mentioned by Lowe.
At this point, I have to mention that all these tools for Sanskrit Samāsa are routinely used within Telugu, and perhaps also in other Indian languages.

It is not just about Sanskrit or ancient texts in Sanskrit. This is a fully alive expressive power in many Indian languages.
When I write a technical essay in Telugu, I often find myself compressing several words in a phrase into a single word.

E.g: "Bādhyatāyuta-kṛtimamēdhānirmāṇaṃ" బాధ్యతాయుతకృతిమమేధానిర్మాణం बाध्यतायुतकृतिममेधानिर्माणं (the development of responsible artificial intelligence)
What is the advantage of having such a long compound word? Then it be combined with other words to express a complex thought as a single sentence. It can have all types of case inflections (signifying kāraka roles). Such precision will not be obtained when we break the sentence.
There is an extra power, which most people don't notice. Sanskrit doesn't need any punctuation marks whatsoever. They are all superflous. We don't need commas, question marks, exclamation marks .. Nothing. A correctly formulated sentence will be 100% precise without any of that!
This exact same fecundity and precision of Sanskrit is available to other Indian languages, whenever they borrow the grammatical machinery of Sanskrit. But most people have forgotten this, after centuries of colonial rule which destroyed native traditions of scholarship.
There will be a time when full professional education, research scholarship and creative expression will return to Indian languages. When that happens, people will see how *inadequate* the current expression in English is. I hope this time occurs soon, in our own lifetimes. (END)
I forgot to mention: I once translated the titles of my scientific papers into Telugu.

All these titles are actually single words in Sanskrit, when Sandhi is applied !Just like the titles of Sanskrit literature.

In Telugu, the Sandhi is optional.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with vakibs

vakibs Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @vakibs

14 Sep
The disgusting adulteration of the Telugu language was started as a trend by Telugu poets themselves, like Sri Sri. They had no idea how deep into wretched depths the language would sink in just 3-4 decades.

The reason behind is the destruction of university courses in Telugu.
Now it is fair to say that none of the Telugu politicians can actually speak in Telugu, without switching half the sentence into English, or even switching into a full English sentence. Telugu poets, singers, film celebrities .. nobody can speak unadulterated Telugu anymore !
The decay of the Telugu language is extraordinary even within the last one decade. The current chief minister (Jagan Reddy) is educated in an English medium school, and his grip on the language is barely a shadow of his father’s (who also served as chief minister).
Read 4 tweets
2 Sep
On today starts the Pitṛpaksha fortnight, when ancestors are venerated in Hinduism. Food offerings are made to the ancestors, who partake them in the form of birds.

On this day, I want to point out an important duty that we have towards our ancestors, which is often forgotten.
First, I want to make an argument why one needs to care for one's ancestors, even if one may not believe in afterlife. The reason is that our ancestors are a living part of our own consciousness: in the language we speak, in the food we eat, in our cultural habits and so on.
It is not just our immediate ancestors within the family that we are indebted to. Indeed, we are indebted to several generations of ancestors, who have contributed to our culture, to our knowledge, and to our standard of living. Our ancestors made this world habitable for us. 🙏
Read 13 tweets
31 Aug
Nice article with some good suggestions, but I disagree with the majority of the suggestions given. 😀

An elitist strategy for AI, building centres of excellence and so on, will not give any competitive edge to India. It will simply prepare poaching grounds for US companies.
The fact is there are already a few centres of excellence in India, and they regularly publish at top AI conferences. They are also doing the hub and spokes model for disseminating AI know-how to industry. The quantity of impact is small, but proportional to the investment made.
There is a lot of buzz-wording in the article, as AI has indeed become a hype in the industry. But the reason why India must invest heavily in AI and develop know-how is not because of industry.

It is to protect our democracy. Without a democratic AI, there will be no democracy.
Read 18 tweets
21 Aug
Here is my 2 cents on the circus of US elections: Kamala Harris is not Indian American, she is a South Asian American.

About Biden vs. Trump, it’s a battle between Methadone and Viagra. One of them numbs your brain dead. The other tries to wake up what is (and ought to be) numb.
On the ridiculousness of the phrase “South Asia”: “Americans” call themselves with a name which denotes not just one, but two continents, but insist on calling an ancient civilization as if it is some province of their global empire, stripping all of its history and identity. 😏
Historically, it is not only the whole of the Indian subcontinent (suck on that phrase, idiots!) was called “India”, but even “south east Asia” was called “India Extra Gangem” (India beyond the Ganges).

There was no “South Asia”.

Read 6 tweets
18 Aug
Access to horses was the most critical factor in the success of an empire in the medieval period. Slowly but steadily, India was colonized by foreign powers after this access to horses was cut off.
Hiuen Tsang wrote of the five regions of India (Jambu Dwīpa) of which the northern region was known as the land of the horses, where men were cruel in disposition: they divide like birds going here and there, tending to their flocks of cattle.
One fascinating thing is that in Hiuen Tsang’s time, Xinjiang region was still being referred to by its Sanskrit name “Gōsthāna” गोस्थान (pasturage), which he writes as “Kiu-sa-tanna”. That ultimately became “Khotan”. The king called himself a descendant of Kubēra (Vaiśrāvana). Image
Read 10 tweets
1 Aug
Here is the digital version of “Datang Xiu Ji” - the descriptive geography of the “Western World” by Hiuen Tsang (Xuan Jang), translated by Samuel Beal. It is highly entertaining, different regions of India ca. ~630 AD are described with a lot of color.…
I think the map above is not fully accurate. The Chinese names of Indian regions are not very clear.

Hiuen Tsang described many kingdoms and mentioned their circumferences and distances in the Chinese unit of “Li”, which in the Tang Dynasty period was apparently 332 m (~1/3 km).
There are serious questions about the accepted timelines in Indian history when we compare with the text of Hiuen Tsang’s description. In 630 AD, the Godāvari delta region was supposed to be ruled by the Vēngi Chālukya kings (freshly conquered). They are not mentioned at all.
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!