Last month I presented seven sentences in seven different languages, all written in a form of the Chinese-character script. The challenge was to identify the languages and, if possible, provide a translation.
Six of those seven sentences are historically attested. One is not: I invented #7. I’m going to dive into an exploration of that seventh sentence in today’s thread.
Sentence #7 is an English-language sentence written sinographically — that is, using graphs that originate in the Chinese script. I didn’t do this for fun (even though it is fun), or as a proposal for a new way to write English.
I did it as a thought experiment. Why? Because thinking about how the modern Chinese script might be adapted to write modern English can give us valuable insights into historical instances of script borrowing, like those that took place centuries ago in Japan, Korea, and Vietnam.
So here’s the amazing thing about Sentence #7
(今天愛晚特語兔吃二魚佛午飯):
It superficially looks like nonsense. I didn’t explain anything about it, I just put it out there. And yet quite a few people deciphered it without any hints, and pretty quickly too. How did they do it?
Understanding this written sentence requires (a) ability to read Standard Written Chinese and pronounce it aloud in Modern Mandarin, and (b) knowledge of modern English.
今天愛晚特語兔吃二魚佛午飯
"Today I want you to each two fish for lunch."
What’s amazing is that anyone with those two skills who spends a bit of time with the sentence is able to crack the code without too much difficulty. This tells us something about the way Chinese characters function and how they can be repurposed to write different languages.
Let’s first see how this works in the English sentence, and then we can talk about the implications for how Korean, Japanese, and Vietnamese speakers first figured out how to commit their spoken languages to writing using Chinese characters.
First, I’ll add some spaces to the written sentence to provide some structure for our discussion:
今天 愛 晚 特 語 兔 吃 二 魚 佛 午飯
The sentence is made up of eleven units, each composed of one or two characters.
If you can read Chinese and speak Mandarin, then each unit has two conventialized properties: a pronunciation and a meaning. We can schematize this relationship this way:
Character(s): {pronunciation, meaning}
I'll give pronunciations in pinyin, and meanings in English.
If for each unit you focus on just the pronunciation or just the meaning, ignoring the other property, then you can get a coherent English sentence. Like this:
TODAY ài wǎntè yǔ tù EAT TWO FISH fó LUNCH >
“Today I want you to eat two fish for lunch.”
I have to point out again how amazing it is that people figured this out. I did not say that this sentence was in English. I did not explain that each graph is used for either its sound or its meaning (but not both). I gave no guidance at all on the writing technique I was using.
And still people figured it out. HOW?
Well, it turns out that humans have a natural, intuitive ability to split apart those pronunciation-meaning pairs. Even little kids do it without prompting.
For literate Mandarin speakers, the meanings and pronunciations come easily. For English speakers, knowledge of the lexicon and grammar of English constrains the possible interpretations of the Chinese characters.
For someone who knows both languages, the result is an ability to read the sentence — EVEN WITHOUT BEING TOLD HOW TO DO IT.
You’ve probably guessed by now that this is how Chinese characters were adapted to write spoken languages that had never before been written. And this is indeed how ancient forms of spoken Japanese, Korean, and Vietnamese came to be written down for the first time in history.
But that’s only the first part of the story.
Because … reading this way may be simple, but it’s also hard. It’s slow. It’s tiring. You can almost feel the cognitive load weighing on your brain as you work through the sentence. Why?
Because you are testing multiple possibilities for each character, constructing partial English sentences in your mental workspace, then redoing and refiguring each time you hit a dead end.
It sure would be easier if there was some indication of which characters were being used for their sound value and which characters were being used for their meaning value, wouldn’t it? It would take a lot of the guessing and second-guessing out of the process.
Does “二 : {èr, ‘two’}” write the English word ‘are’ or the English word ‘two’?
Does “魚 : {yú, ‘fish’}” write the English word ‘you’ or the English word ‘fish’?
Does “兔 : {tù, ‘rabbit’}” write the English word ‘to’, ‘too’, ‘two’, or ‘rabbit’?
Each time you encounter one of these characters, you have to pause to figure out which possibility best fits the context of the partial English sentence you’ve deciphered so far.
This is the problem of ambiguity — and it was a major factor in motivating further adaptations of Chinese script to write down spoken languages other than Chinese.
You can probably think of many ways to visually indicate which characters should be read for their meaning and which should be read for their sound. Unfortunately, the formatting limitations of Twitter make it difficult to illustrate them in text. I’ll provide pictures.
You might imagine using some kind of visual marking, like (1) underlining or (2) enclosing, that a writer could use to indicate to readers which characters are used for their pronunciation. Or maybe (3) putting a special semantic indicator like 口 next to such characters.
There are still other conceivable techniques for graphic differentiation that perhaps don’t come to your mind as quickly.
For example, you might simplify or abbreviate those graphs used for their sound value so that they become visually distinctive from those used for their meaning value, perhaps changing their whole look and feel to be more angular, or (as in (4)) more curvy.
And then there are techniques that aren’t graphic at all. You might come to the conclusion that it’s really the grammatical bits of English (plural -s, verbal -ing, prepositions, pronouns) that are best suited to sound-based representation ...
and the more lexically solid bits (verbs, nouns) that are easier to represent using characters for their meaning. In this way, the very grammatical structure of English would help you to parse the correct interpretation of each character.
If you were to do that for Sentence #7, it would lead you to adapt the characters this way:
meaning-based adaptation: want, eat, two, fish, dinner
sound-based adaptation: I, you, to, for
Or … and now things get really interesting … you might even decide that the best way to avoid ambiguity entirely is to specify both the sound and meaning of the target English word *at the same time*! How could you do this?
Well, consider the English word ‘two’. Its meaning could be represented by the character 二 : {èr, ‘two’}. Its sound could be represented by the character 兔 : {tù, ‘rabbit’}. What if we stacked or combined the two characters into a single representation of the English word?
That is, ⿰二兔 or ⿱二兔. That’s a cool way to represent {/tu/, ‘two’}! Not only is it unambiguous (it can’t possibly write ‘rabbit’, ‘to’, ‘too’ or ‘are’), it also isn't a Chinese character, so it's clear that what is being written here is not a Chinese word but an English one.
Once again you can probably guess where this is going! Gotta stop here for now, so we’ll pick all this up next time and see exactly how these few simple ideas explain how written forms of spoken Japanese, Korean, and Vietnamese first developed.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
It may seem obvious from the title — Chinese Characters across Asia: How the Chinese Script Came to Write Japanese, Korean, and Vietnamese — but that’s just part of the story. There’s a lot more happening in these pages.
2/
Before I got any further, let me just say that although this is a sophisticated book but it’s most definitely NOT a technical book. It’s meant to be readable and understandable by just about anybody, without being dumbed down or inaccurate.
3/
3/ Recall that in Part 1, we established that you are a linguist in the year 3022 (that's you in the image). And you are working on reconstructing Cantonese as spoken 1,000 years in your past! You’ve got an excellent textual source to help you, a dictionary.
At long last, Part 2 of this thread. We’re thinking about how much we could reconstruct of late 20th-century spoken Cantonese from a vantage point 1,000 years in the future ... if this dictionary were our only available source of information.
2/ Here’s the setup: The year is 3022, you’re a linguist, and you’ve stumbled across a precious document: a dictionary of Cantonese. The existence of the language was already known, but no direct documentary evidence was known to be extant: until now.
3/ You undertake a systematic analysis of the dictionary data. This is the book you eventually proudly publish: a reconstruction of the ancient language Cantonese from ten centuries ago!
In a thread I posted a few days ago, I explained that the Mandarin name Yālù and the Korean name Amnok not only refer to the same river, but are in fact historically the same name.
2/ One of the great things about sharing these ideas on Twitter is that more knowledgeable people point out mistakes or provide additional information.
I got some very informative feedback/pushback on the Manchu etymology: the “twist” in that thread.
1/ This is the river that divides the Korean peninsula from continental East Asia. It runs along the current border between North Korea and the People’s Republic of China.
What is its name? Depends on which side of the river you are on.
2/ When I first learned that the Yālù River and the Amnok River were the same river, I assumed that these Mandarin and Korean names must be different, unrelated names.
YALU ≟ AMNOK
3/ Later, after I’d become more sophisticated about Chinese and Korean language history, I realized that they are historically the same name: the Mandarin and Korean pronunciations of 鴨綠/鸭绿 meaning ‘duck green’.