1/ In this thread I’m going to talk about a highly unusual syllable gap in Standard Spoken Chinese, aka Modern Standard Mandarin, which is based on (but not identical to) the pronunciation of the Beijing variety of Mandarin.
2/ Sure, you can quibble with what’s been included in and excluded from this chart. Should “kei” be there? Should “den” (cf. dèn 㩐/扽 ‘to yank’) be left out?
We won’t worry about these details today, though they are interesting questions.
3/ The gap that I'm going to talk about is one that I suspect you have never noticed, let alone thought about. The reason I say it’s unusual is that it’s extremely rare in languages around the world. To understand how it got there, we’re going to need a historical perspective.
4/ “Ah-ha!” you say, “I bet I know which syllable gap they’re talking about!”
I assure you, you’re wrong.
5/ It’s not this one, the well-known gap due to the co-occurrence restriction on initial dentals z c s, retroflexes zh ch sh, and velars g k h combining with finals beginning with high front vowel sounds i and ü.
6/ That restriction is the reason why we don’t have syllables like zhiang, cü, or hin. It’s a well-described feature of Mandarin phonology and it’s not what I’m going to talk about today.
7/ Also it’s not this related one, the gap due to the co-occurrence restriction on initial palatals j q x combining with finals that *don’t* begin with high front vowel sounds i or ü. It’s the reason there are no syllables like jai, qeng, or xui.
8/ The complementary distribution pattern created by these gaps raises challenging questions about how to phonemicize Mandarin initials. It’s an old problem going back nearly 100 years and plenty of ink has already been spilled on it. I won't spill any more ink here.
9/ Also, it’s not this gap, which is a lack of syllables beginning with initial labials b p m f followed by high rounded vowels (excepting the final -u). (attn: @Tao_Collective)
10/ And no, it’s not this gap, which has arguably become occupied recently by the biang of biang-biang noodles. (attn: @JPRidgeway)
11/ Nor is it the gap that’s been filled by the Jackie Chan-inspired sound of shiny hair. (attn: @likethemagician)
12/ Or the gaps that the renowned Chinese linguist Y.R. Chao filled, in what might seem a slightly cruel linguistic joke, in naming one of his daughters. (attn: @iwsfutcmd and @Bad_Linguistics)
13/ Don’t get me wrong, those are all great syllables gaps, and I have nothing against them. Each is worthy of pursuing down a long rabbit hall. But that’s not why I’m here today.
Nope, the gap I’m going to talk about today is this one:
14/ Simple consonant-vowel syllables consisting of a velar initial like k or g combined with the low vowel /a/ are pretty much the most basic, common syllables you can find in languages.
15/ Even languages with really limited sound inventories, like Hawaiʻian and Japanese, have ka-type syllables.
(Think of the names Kauaʻi and Kawasaki, which start with /ka/).
16/ And it's not like these syllables only occur in weird or rare words. They are in basic vocabulary. ka- is the root of the verb ‘to go’ in Korean. In …
What?
What’s that, you say?
You don’t see a gap there?
17/ Okay, okay, I hear you. I see you. I understand what you’re saying. Patience, please.
18/ Now, where was I? Oh yes, the verb ‘to go’ in Korean is ka-. The word for automobile in British English is /ka/. In Cantonese, ‘home/family’ is /ka/ (“gaa1”). Think about any language you know, and you’ll find it’s lousy with ka-type syllables.
가 | 🚗 | 家
19/ So the fact that Mandarin Chinese, a language spoken by hundreds of millions of people, lacks syllables like ga ka ha (IPA: [ka kʰa xa]) is really …
What?
20/ Okay, I understand that you think I’m not making any sense. Hallucinating, or worse. You can plainly see that those three cells are filled in with pinyin syllables “ga ka ha”.
To tell you the truth, I can see them too.
So who is and who isn't minding the gap here?
21/ But the thing is, the chart is kind of misleading. If you’ll indulge me a bit longer, I’ll explain.
(The chart, by the way, is from p. 48 of The Languages of China by S. Robert Ramsey, Princeton University Press, 1989.)
22/ The chart hides a lot. For example, “yi”, “jiang”, “ri”, and “lia” all look pretty much equivalent on the chart. Those four cells are filled; those four syllables are all indicated as existing.
But there’s a lot of important info that this chart format doesn’t make apparent.
23/ There are dozens of morphemes pronounced “yi” in common use, in all four tones. Pick up any dictionary (those of you old enough to own pick-upable dictionaries) and you’ll find page after page after page of them.
一伊衣醫 宜移夷疑 以已乙椅 亦易邑譯
24/ The syllable “jiang” is, like "yi", really common. There are dozens of characters with that pronunciation. And yet, if you look at them carefully … you’ll notice something odd: None are pronounced with the second tone.
25/ No shortage of jiāng, jiǎng, and jiàng (e.g. 江獎降), but not a single second-tone jiáng.
Think about it! It's true.
That’s a gap the chart just doesn’t show you.
26/ You also can’t tell from the chart that there is not a single spoken syllable of Mandarin that both begins with r- and has a first-tone pronunciation.
Nor can you tell that the syllable “ri” only occurs in the fourth tone.
27/ Nor can you tell that “lia” and “seng” and “zei” are unusual: each syllable is represented by only a single spoken morpheme in Standard Mandarin. That’s a big contrast with “yi” and “shi”.
倆/俩
僧
賊/贼
28/ Those are some differences in the behavior of syllables that occur. There are also differences in the syllables that don’t occur. Not all gaps have the same properties.
29/ A native Beijing speaker would find the non-occurring syllable “sei” perfectly easy to pronounce, if you convinced them there was a good reason to do so. (Consider again Y.R. Chao’s daughter Lensey—Lénsèi—趙萊痕思媚.) en.wikipedia.org/wiki/Lensey_Na…
30/ But that same Beijing speaker would probably deem other non-occurring syllables, like zhüan and shie, impossible, or at least extremely unnatural.
31/ Are these patterns of occurrence and non-occurrence, and of sub-types within each, of linguistic significance? Or are they random, accidental? Like I said, lots of rabbit holes here.
32/ But it’s this gap I want to talk about today, the total lack of ga ka ha, because it’s just so surprising given how common these kind of syllables are cross-linguistically and … hmmm?
What? 👀
33/ Oh, okay.
Okay, okay, okay.
I can tell when you’re getting impatient.
34/ Who are you going to believe, me or your own lying eyes?
😳
35/ I’ll explain it this way. Take a look at nearby gai kai hai. Plenty of normal, ordinary, completely pedestrian and uninteresting words with these pronunciations. Like gǎi 改 ‘to change’, kāi 開/开 ‘to open’, and hǎi 海 ‘ocean’. Mundane syllables.
36/ But here is what we have for the syllables I keep insisting don’t exist:
ga: 尬嘎噶
ka: 卡咖
ha: 哈蛤
Not a 'change', 'open', or 'ocean' among them.
37/ These Mandarin ga, ka, and ha things aren’t really words. Not like Korean ‘to go’ or English ‘automobile’ or the Japanese subject-marking particle.
They are marginally lexical at best: onomatopoetic, used to transliterate foreign sounds, or ideophonic.
38/ Sure, Mandarin speakers can say them. With ease. Sure, there are characters for them. But ha 哈 isn’t exactly a word. It’s the sound of laughter, and the first syllable of the Mandarin pronunciation of Harvard.
39/ Also worth noting that a lot of the characters that pop up in my computer input method when I type pinyin ga, ka, or ha have alternate pronunciations jia, qia, and xia, respectively — and those pronunciations are the more common, and apply to actual words.
40/ So there really is a gap here, even though it doesn’t look like it when you just glance quickly over the chart. And the gap requires some explanation. I mean, this is a really weird situation.
Yeesh, 40 tweets already, and we've barely gotten started. And it's past my bedtime. So we'll pick this up again tomorrow. Or the next day at the latest. I promise.
Thanks for your patience, everyone. I'm back and I’m ready to pick this thread up again. I’m grateful for comments and questions some of you raised in response to the first part of the thread, which have shaped the way I’m going to proceed. Buckle up!
41/ I keep saying that Modern Standard Mandarin has a gap. It almost totally lacks lexically solid words pronounced ga, ka, or ha. And I've pointed out that cross-linguistically, it's a weird gap.
So the first question is: How and why did the gap come to be?
42/ We’ve also seen that these ga-ka-ha syllables are easy for native Mandarin speakers to pronounce, and that they occur with reasonably high frequency in the spoken language. (People talk about coffee—kāfēi 咖啡 ☕️—all the time!)
43/ So the second question is: Where did these syllables come from that are filling the gap? And what function do they play in the language? (I'm talking about syllables like the kā of kāfēi.)
44/ And the third question is: What’s the future of this sort-of-there-and-sort-of-not gap in Mandarin?
I mentioned earlier that we’re going to have to approach these questions from a historical perspective. Luckily, I'm a historical linguist!
45/ And we’ll keep this simple and focused, to avoid getting caught up in too many distracting details.
But—here's a shocker—do keep in mind that things are always more complex in reality than they appear to be on Twitter.
46/ 1️⃣ So let’s tackle the first question. Actually, let’s start with Middle Chinese, which approximates the pronunciation of varieties of northern Chinese spoken around the start of the Tang Dynasty.
47/ Like most languages that have ever been spoken by humans on earth, Middle Chinese (MC) had an abundance of syllables like ga-ka-ha. What happened to them? Why didn't they persist into Mandarin?
(I write reconstructed MC pronunciations preceded by an asterisk *.)
48/ Here are a few examples of words in Middle Chinese pronounced *Ca (where C stands for an unspecified consonant), with their pronunciations in Modern Standard Cantonese (C.) and Modern Standard Mandarin (M.). (The pattern is simpler in Cantonese.)
49/ 波 ‘wave’: MC *pa > C. bo1, M. bō
破 ‘to break’: MC *pʰà > C. po3, M. pò
多 ‘many’: MC *ta > C. do1, M. duō
拖 ‘to pull’: MC *tʰa > C. to1, M. tuō
左 ‘left’: MC *tsá > C. zo2, M. zuǒ
歌 ‘song’: MC *ka > C. go1, M. gē
可 ‘may’: MC *kʰá > C. ho2, M. kě
50/ What you see in all these examples is that over time, the /a/ vowel moved up and back, and rounded to -(u)o [wɔ]. (Later on in the history of Mandarin, it unrounded to -e [ɤ] after velars.)
51/ It’s a pretty consistent change, but it didn’t remove -a-type syllables from the language, because other syllables were changing at the same time. The whole vowel system was getting reoriented.
52/ New Ca syllables were created by the movement of the mid-low front vowel *æ (as in English “cat”) down and back, occupying the space recently vacated by earlier *a.
So while *pa changed to bō, *pæ changed to bā.
53/ Here are some examples of that second change:
巴 [place name]: MC *pæ > C. baa1, M. bā
馬 ‘horse’: MC *mǽ > C. maa5, M. mǎ
茶 ‘tea’: MC *ɖæ > C. caa4, M. chá
沙 ‘sand’: MC *ʂæ > C. saa1, M. shā
家 ‘family’: MC *kæ > C. gaa1
夏‘summer’: MC *ɣæ̀ > C. haa6
54/ This is the origin of many simple Ca syllables in Mandarin and Cantonese.
But why did I hide the Mandarin pronunciations of 家 and 夏 from you?
55/ It’s because there’s a really interesting development that redirected these words along a different developmental pathway in Mandarin. The sound change is regular, but restricted to velar initials.
56/ Here’s what happened:
A y-type sound (notated [j]) arose between the velar consonant and the vowel. So MC *kæ first became *kjæ, which turned into Early Mandarin *kja, and then Modern Mandarin jia.
家 ‘family’: MC *kæ > C. gaa1, M. jiā
夏 ‘summer’: MC *ɣæ̀ > C. haa6, M. xià
57/ And thus:
No ka, ga, or ha syllables in Mandarin any more!
58/ But of course that’s not the end of the story, because even though there were no more ordinary Mandarin words with these syllabic values, they are such natural and common sounds for humans that very quickly the gap started to get filled. Indeed, it was never fully empty.
59/ 2️⃣ With what? With imitations of sounds like laughing (onomatopoeia), of sounds in other languages (borrowing and transcription), and as the result of irregular sound developments due to the effects of bisyllabic words on their component syllables.
60/ In that latter category I would put háma 蛤蟆/蝦蟆 ‘frog, toad’ (which may well be onomatopoetic in origin). In Middle Chinese this word was pronounced *ɣæmæ.
We would expect xiáma in Mandarin.
61/ But remember, this is not a compound. It's a single bisyllabic morpheme; há 蛤 has no independent existence. So as the second syllable of this rhyming word *mæ changed to ma, the first syllable was pulled along in parallel, preserving the rhyming pattern of the word: háma.
62/ Likewise with gāngà 尷尬. Neither syllable means anything alone.* Always bound up together, the syllables in this word continued to be alliterative and assonant.
*gà 尬 has only recently lexicalized. See point 3️⃣ below!
63/ But here’s the funny thing that’s happening. As I’ve pointed out, there are hardly any *real* words with Mandarin ga ka ha syllables. But that is slowly changing.
64/ Maybe the best example of that change is kǎ 卡.
Originally a transcriptional character, it’s been used to represent the “ka” syllable of borrowed European words like “car”, “card”, “calorie”, “casette”, etc.
65/ As a result, this syllable that was once just an imitation of the foreign sound /ka/ is now the pronunciation of a set of full-fledged words in Mandarin. I suspect most native speakers don’t recognize them as foreign borrowings anymore. They've nativized.
66/ Kǎ 卡 meaning ‘card’ is an ordinary Mandarin morpheme now. It’s versatile too, freely forming new compounds: kǎpiàn 卡片 ‘card’, xìnyòngkǎ 信用卡 ‘credit card’, jìyìkǎ 記憶卡/记忆卡 ‘memory card’, diànhuàkǎ 電話卡/电话卡 ‘phone card’, etc.
See what's happening here?
67/ The gap was created by the unusual development of Middle Chinese *kæ-type syllables into Mandarin jia-type syllables instead of ga-type syllables.
But that gap has already become hard to spot. Natural languages abhor a vacuum.
It takes some effort to mind this gap.
68/ It’s the lexical flimsiness of most of the words with these pronunciations that clues us in to the fact that they are new or irregular elements in Mandarin, not direct inheritances from earlier stages of Chinese.
68/ 3️⃣ Over the next several hundred years, I’d wager that more and more real words will crowd into this space, layering on top of the existing transcriptional and ideophonic semi-words, and then the gap will truly be invisible.
69/ I’ll leave you to contemplate that final image: an invisible gap.
/end
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I can't resist having references to "Out here in the fields" and "Teenage wasteland", but at your request I've made a second version with no distractions:
And yes, it's very strange that Sinologists don't read reconstructions aloud, especially when it comes to medieval poetry. Sound is an integral part of poetry; you'd think scholars of poetics would be falling all over themselves to recite the sounds of the poems as written.
Can you imagine scholars of Old English literature thinking about, talking about, or analyzing Beowulf without reciting it aloud in the original?
Yet that practice is the norm for dealing with ancient Chinese poetry. I've never understood it. Perhaps it's a failure of my field
I’ve worked up a minute-long video recitation of a brief passage from the 3rd-century BCE Shāng Jūn Shū 商君書 (Book of Lord Shang) to try to give a feel for what the language might have sounded like around the time these words were first written. 1/
For the content, I chose the first few sentences from Chapter 2, Kěn Lìng 墾令 (Order to Cultivate Waste Lands), in response to this video and request from @stateswarring . For Old Chinese, I used Axel Schuessler (2009). 2/
I often recommend Schuessler’s “Minimal Old Chinese” reconstruction system to students of ancient China who aren’t specialists in historical phonology. It’s based on the framework of William Baxter’s influential 1992 Old Chinese, but strives to be less speculative. 3/
@Tao_Collective@KIRINPUTRA@viroraptor@homosappiest@xiao_collective@catielila@BadLingTakes They aren't commensurate, for several reasons: (1) The textual record is incomplete, much is lost to us. So there might be words attested only in texts that haven't survived. (2) Because writing is employed only in certain socio-cultural contexts and is not a precise