多音字 – one of my favorite complications in learning Chinese. In another post, we have already dealt with 白讀 and 文讀 pairs like 白 [pai] / [puo] (from older [bak]), 色 [shai/se] (from older sek) etc. Today’s topic is another large group of 多音字: 去聲-derivatives. In other words, characters which can be read in two different tones, one of which is usually a 去聲, like for 教 or 要.


I am calling them derivatives because I assume that one of the readings is derived from the other in the same way that “to book” is derived from “a book” (more easily recognizable in German: Buch -> buchen), and that the 去聲 variant is usually the derived one.

Two questions suggest themselves: (1) how do I know and (2) why do I care?

As for question (1), I have observed that 多音字 come in three main types:

  • 白/文 pairs, recognizable by their typical vocalism (mostly old 入聲 words)
  • Same character, unrelated meanings (typically  due to “simplification”, think 干)
  • Two readings with related meanings but different tones – our topic today

For the last set, one of the two readings is usually a 去聲 , as in 王 king (P) and to be king (Q). Of course, I was not the first to notice that, see [1] for more on the topic. Given that the 去聲 is nowadays reconstructed as -s (and perhaps also -t) from comparative linguistics, the rule “QS = derived word” seems logically appealing. (Note: by rule I mean: for a 多音字; there are original QS words, like, well 去).

As for (2), why would I care? Well, I like to sort things into boxes. Rather than having to learn a bunch of unrelated phenomena, I would rather learn one general rule. E.g in English, if I know that to widen comes from wide, I have no trouble understanding  loosen, strengthen, heighten – rather than one word, I have just learned a whole class. And I am hoping to do the same in Chinese.

Now, the example of “-en” derivations in English is a very fitting one, because (unfortunately) there is more than one “-en” class. There is the adj./noun => verb pattern we just discussed. But the same ending can also be used to make adjectives from nouns: gold-en, old-en, flax-en, earth-en, wood-en, wool-en … basically a whole host of materials can be turned into adjectives this way (but by far not all: there is no coal-en or steel-en). And then, of course, “-(e)n” is also used to form many past participles: risen, slain, sunken, seen, been. A quick analysis of English shows that there are at least 3 different classes of -n derivatives. So, is it even worth thinking about it? — I believe it is, because at the very least I now have three boxes that I can sort new word pairs into: they are either participles, adjectives formed from the names of materials or verbs made from nouns/adjectives by adding -en. Complicated, but better than complete randomness ;-).

So, on to find some new boxes in Chinese. Here is my working model (simplified for practical application from [1] and others):


(1) Noun => Q-verb

衣: P: cloth, Q: to wear

王: P: king, Q: to be king

家: P: family, Q: to be given in marriage (written: 嫁)


(2) Verb => Q-noun

教: P: to teach, Q: teaching, religion

知: P: to know, Q: wisdom (智), wise [2]

乘: P: to ride, Q: carriage [2]

傳: P: to transmit, Q: a record [2]

聞: P: to hear, Q: reputation

處: S: to be at, Q: place

數: S: to count, Q: number


(3a) Q-word with a “more restricted meaning”

差: P: difference, to differ Q: to differ from, to lack

要: P: demand, Q: to want

好: S: to be good, Q: to be fond of, to like

少: S: to be few, Q: to be young


(3b) Causatives (make somebody do something):

飲: S: to drink, Q: to give to drink

買: S: to buy, 賣 Q: to sell (note: character change)

散: S: be loose, Q: to scatter


(4) Mixed bag of Qs

空: P: empty, Q: to make/be empty/free, but also: vacant, spare time

間: P: between, Q: gap, to separate

Of course, there are still pairs left which have nothing to do with 白/文 nor 去聲 derivatives, like 將, 沒, or 長. But classifying away a big part makes it easier to deal with the rest ;-).



[1] Guillaume Jacques. How many *-s suffixes in Old Chinese? . Bulletin of Chinese linguistics, Brill, 2016, 9 (2), pp.205-217

[2] Branner, D. (2002). Common Chinese and Early Chinese Morphology. Journal of the American Oriental Society, 122(4), 706-721. doi:10.2307/3217611
















Grand tour of the 音読み (2)

Moving on from the initials discussed in part 1, now it is time to look at the medials in Chinese and how they are reflected in the Japanese Onyomi.


What are medials?

When talking about Chinese phonetics, we normally distinguish initials (36 of them in the case of Early Middle Chinese EMC) and finals. This is reflected in Chinese fanqie (反切) system to indicate spellings: = 古電切 (same initial as gu, same final as dian => gian). The number of rimes varies over time – the 601CE 切韻 knows 193 韻 while the 廣韻 400 years later distinguishes 206 韻. How do you get that many finals when you have only 5-6 vowel nuclei and about as many possible final consonants (p, t, k, m, n, ng)? The answer is medials: -i- and -w- glides between the initial and the codas which lead to a significant increase in the number of possible syllables, changing /-an/ into /-an/, /-ian/, /-üan/ and /-uan/.

The photo below shows one of the 43 tables in the Qieyun (切韻 ) – the columns indicate the initials, the four major horizontal sections are the four tones, and within each tone you have four sub-sections, the so-called 等, called divisions or grades in English.


While the exact details of the grades are still in dispute, it is easy to see that they were related to glides (medials) and that they will be important understand how a Chinese syllable will develop in different fangyan and in Japanese:

Tone 吳音 漢音
開 一  平 (去) kan kon kan kan kan
 合 一 上/去 san saan soã/san san san
開 二 kjän kaan kan ken kan
 合 二 shan saan soã/san sen san
開 三 kiän kin kian kon ken
 合 三 siän sin sian sen sen
開 四 kiän kin kiⁿ/kian ken ken
 合 四 siän sin sing/sin/sian sen sen

(Note: The table above uses a quasi-IPA instead of Pinyin or Yale to achieve compatibility between the different languages; k is pinyin g etc.; j stands for a medial missing in 粵)

The first column shows the grade as well as the overall classification as kai (開) and he (合). He syllables often have a /u/ sound in Mandarin (as in guan), while kai syllables show no such feature.

Looking at the realization of -an in the four grades, you can see that grade clearly matters: grade I has a clear [an] which is also preserved in Japanese and grades III and IV have shifted the [a] to [Ɛ] (Umlaut) in Mandarin, which is mostly reflected in Japanese as well. Grade II is somewhere in between – Go’on gets /e/ but Kan’on gets /a/. If you were just trying to build rules for Mandarin to Japanese without being aware of the grades you could not explain your findings – the grades bring more clarity into that. So, you have to work at least at grade-level if you want to understand correspondences, perhaps also including he-kou and kai-kou.





[To be continued]








Grand tour of the 音読み (1)

Let’s say you already speak Chinese and are learning Japanese. All the those familiar characters … but their pronunciation is often confusing, with the same initial in Chinese producing different outcomes in the Japanese “Chinese” pronunciation (音読み):

  • Chinese initial sh-: is it /j-/ (時) or is it /sh-/ (仕)?
  • Chinese initial zh-: /sh-/ (主) or /ch-/ (中)?
  • Final -ik:  /seki/ (石) or /shoku/ (食)?
  • Final -eng: is it /e:/ (生) or is it /o:/ (正)?
  • 日 – nichi? jitsu? 人 – jin? nin? 月 – getsu? gatsu?

Is there some order and predictability in this? Join me for the grand tour of the 音読み!


First things first – some helpful terminology:

  • The different types of Onyomi 音読み
    • 呉音 (Go’on) – the oldest Chinese readings, from the 飛鳥時代 (538-710)
    • 漢音 (Kan’on) – the newer Chinese readings, from the 奈良時代 (710-794)
    • 唐宋音 (To-so’on) – even later, not systematic, but incl. common words like 椅子
  • The major periods of Chinese:
    • Old Chinese (OC) – e.g. the language of the 詩經 (1st millennium BCE)
    • Middle Chinese (MC) – from the Tang 唐 (EMC, as in the Qieyun 切韻, written in 601) to the Song 宋 (LMC,as in the Yunjing 韻鏡, written in the mid-1100s)
    • Mandarin (官話) – begins with the 元, as recorded in Phags-pa documents

Let’s start with the initials, that is to say the beginning consonants of each syllable. Despite the names (Wu-reading Go’on 呉音, Han-reading Kan’on 漢音), most of the Chinese imports into Japanese happened in the Middle Chinese period, so we can build our model based on the Middle Chinese 36 Initials:


[See https://en.wikipedia.org/wiki/Rime_table#36_initials for more on this]

So, without further ado, here is the fundamental correspondence table between MC and Japanese initials, showing Chinese initial, 呉音, and 漢音:

幫 [p]・非 [f] 滂 [pʰ] ・敷 [fʰ] 並 [b]・奉  [v] 明 [m]・微 [ṽ]
[p] → [ɸ] → [h] [b] [m]
[p] → [ɸ] → [h] [b] ; [m] before [ŋ]
端 [t]・知 [ʈ] 透 [tʰ]・徹 [ʈʰ] 定 [d]・澄 [ɖ] 泥 [n] ・娘 [ɳ]
[t] [d] [n]
[t] [d, z] ; [n] before [ŋ]
精 [ts]・照 [tɕ, tʂ] 清 [tsʰ]・穿 [tɕʰ, tʂʰ] 従 [dz]・牀 [dʑ, dʐ] 来 [l]
[s] [z] [ɽ]
[s] [ɽ]
心 [s]・審 [ɕ, ʂ] 邪 [z] ・禅 [ʑ, ʐ] 日  [ɲ]
[s] [z] [n]
[s] [z]
見 [k] 渓 [kʰ] 群 [g] 疑 [ŋ]
[k] [g]
[s] [g]
影 [ʔ] 暁 [x] 匣 [ɣ] 喩 (-)
-, [j] or [w] [k] [ɡ] or [w] [j] or [w]
-, [j] or [w] [k] [k] [j] or [w]

(Adapted from https://en.wikipedia.org/wiki/Sino-Japanese_vocabulary; note that 来 [l] has been placed with the dentals for symmetry reasons)

A few quick observations on the differences between 呉音 and 漢音:

  • In 呉音, the MC stops are perfectly preserved: voiceless and voiceless aspirated map onto voiceless, voiced stops stay voiced in Japanese; in 漢音, all three categories (清,次清,濁) now map onto voiceless stops, reflecting the loss of voicing in LMC
  • In 呉音, the MC affricates merge into sibilants; less technically speaking: all ts-sounds become s-sounds; again voiceless stays voiceless and voiced stays voiced; in 漢音, the voicing distinction is lost and all ts-sounds now become voiceless [s]
  • In 呉音, the MC nasals stay nasals; in 漢音, they become stops, unless the syllable ends in [ŋ] – a similar development can be observed in Hokkien Chinese
  • In 呉音, [ŋ] becomes [n], in 漢音 it is [z]

The different treatment of voiced consonants explains a lot of the seemingly random variation of readings in Japanese:

Ex.: (fan6)
jin dai zi bun butsu
shin tai chi fun futsu

In other words, whenever a word has a voiced MC initial, there may be a voiced initial in Japanese. If is voiceless in MC, it cannot be voiced in Japanese! How do I know if the initial was voiced in MC? Check if it is a 陽-tone in Cantonese (or Min or 吳)!

Note: Unfortunately, if it is a voiced initial you cannot tell whether the word is 呉音 or 漢音 just by looking at the characters, e.g. 神社 is Jinja and 神道 is Shinto. So, the first is Go’on and voiced, and the second Kan’on and voiceless. But at least now you know why :-). Also, note that the second syllable of each word is consistent with the first: both 社 and 道 are both voiced in MC, but come out as voiced /ja/ in jinja and devoiced /to:/ in shintou, matching the treatment of 神. At least you are getting the second half for free!

Before moving on, let’s do some worked examples:

  •  公園 gong1-yuan2 in Mandarin. But remember that the “g” is not really voiced, this is just the pinyin convention for unaspirated [k]. We have not discussed finals yet, but from the table above you should expect something like kou-wen > kou-en. And that is exactly what it is.
  • 時間 shi2-jian1 < shi2-gian1 in Mandarin, si4-gaan1 in Cantonese. The first character is voiced, so it could become shi or ji in Japanese. The second character is clearly voiceless, so it could be GO: ken or KAN: kan in Japanese (you will find out under medials how to tell). Actually, it ends up being a mix of both: ji-kan (Go’on + Kan’on) – mildly annoying, but luckily not the norm.
  • 中国 zhong1-guo2, zung1-gwok3. Voiceless + voiceless, so this can only be shuu-koku, right? Close, but actually the initial in 中 is 知, which was a t-sound at the time (like still today in Hokkien Tiong-kok), not a ts-sound, and as you can see from the table t stays t and later becomes /ch/ in modern Japanese if followed by /i/, so /tiyuu/ > /chuu/. Unfortunately, there is another little complication called rendaku (連濁, connected voicing) which turns the second syllable into goku: Chuu-goku.
  • 出発 chu1-fa1, ceot1-faat3. is a ts-initial, so this should become shutsu, while fa(at) comes from an older puat (Hokkien hoat), so it should be shuts-pwatsu > shuppatsu which is exactly what it is.
  • cha2, caa4, Hokkien: te5, so this one is clearly voiced and it is an original t-initial, as visible from the Hokkien reading. So, this should be dya > ja in Go’on, ta in Kan’on, and sa (喫茶店, kissaten) in To:’on but, catch 22, today’s pronunciation is an alternative, non-standard pronunciation (慣用音): cha.

So, I hope this gives you a first idea as to the usefulness (and limitations) of this method. You can, with near 100% certainty, explain why a pronunciation is the way it is. So, no more complete randomness. In terms of predictability it often works, but may also fail due to phenomena like rendaku, go/kan-mixing or other non-standard readings.

[To be continued in the next post]

文读 and 白读 – in European languages!

A little while ago, I wrote about the phenomenon of 文读 and 白读 in Chinese varieties, most notably Hokkien (Minnan). But actually, we have the same phenomenon in European languages, too!


For example, in Spanish we have:

Latin: pleno- (满):  lleno & plenitud
Latin: filio- (子):     hijo & afiliación
Latin: oculo- (目):   ojo & ocular
Latin: pecto- (胸):  pecho & pectoral
Latin: audir- (闻):   oir & audible

As you can see, the situation is 100% the same as in Chinese: the scientific, more cultured terms use a pronunciation much more like the Latin original, while the everyday words like son, eye, chest etc. show a much changed pronunciation from the colloquial.

And how about English? Actually, in English, the situation is even more complex, because you can have cognates from IndoEuropean times or Latin words imported more than once (e.g from Norman French and later again directly from scholarly Latin), giving rise to two different groups of doublets:

(1) Original English word and 文读 import from Latin (IndoEuropean cognates)

(目): eye & ocular (cf. German: Auge to see the change from c → g → y)
(耳): ear & auricular
(满): full & plenary
(夜): night & nocturnal

(2) Multiple imports from Latin – directly and via Norman-French

Latin: magister (君,主人) → mister, master, magister
Latin: factu- (事) → feat, fact
Latin: senior (老) → sir, senior
Latin: regali- (王之) → real, royal, regal

As you can see, the example of magister is just like 呉音、漢音、唐音 in Japanese. Just that we normally do not think about it, because we use an alphabet, so we do not have to remember to pronounce mister and magister differently.

文读 vs. 白读 in Hokkien


《白日依山尽,黄河入海流》   –   “Piék lít i san tsīn, Hông Hô dzíp hái lîu”。

“What language is that?”, my Hokkien friend asked me – “is it Hakka?” And not surprisingly so, because in everyday Hokkien, “白” would be peh8, not piek8, “山” is soaN1, not san1, “黄” would be ng5, not hong5, “流” is pronounced lau5 instead of liu5 and so forth … so, what happened here?


The answer is 文读 vs. 白读, literary vs. colloquial readings. All varieties of Chinese have this, even Mandarin, e.g. “白” being read as bó in 李白, or “色” being read as [shǎi] instead of [sè] in some cases. Bó and sè are literary readings, imitating the sounds of Nanjing Mandarin rather than the Beijing vernacular which has bǎi and shǎi. In Mandarin, most literary/colloquial pairs occur for 入声 words, but there are others, too, like tuo for 他.

But while literary readings are a minor phenomenon in Mandarin or Cantonese, they are everywhere in Hokkien (闽南). Every word has at least two pronunciations, and some have more:

“一”: 白 [tsit8]  vs. 文 [it4]
“大”: 白 [tua7]  vs. 文 [tai7]
“学”: 白 [ oh8]   vs. 文 [hak8]

An example with 3 readings (from Lin’s grammar): “石头” [chioh8 tau5],“石榴” [siah8-liu6],“药石” [iok8-sek8]. That is, “石” can be read “chioh8“, “siah8” or “sek8”.

Supposedly, the colloquial readings came to the South in the Han Dynasty, while the literary readings reflect the language of the Tang (similar to 呉音、漢音 and 唐音 in Japanese). Not surprisingly, the literary readings sound much closer to Hakka or Cantonese.

From my observation, people switch between the two readings without ever noticing, no doubt because Hokkien is hardly ever taught as a written language:

大学生伫大学读册    –     [tai7-hak8-seng1 ti7 toa7-oh8 thak8-chheh4]

“tai7-hak8” is literary, toa7-oh8 is colloquial (note that some people use tai7-hak8 in both cases). In English we also switch between regal and royal, frail and fragile, mister, master and magister, etc. without ever thinking that these are the same Latin word in two different forms …

But since few people learn how to write Hokkien or read the classics in Hokkien anymore, a poem read aloud in all 文读 sounds like an entirely different language to them – even though 100 years ago this would likely have been the norm for any literary activities. Want to give it a listen? Try this channel from Taiwan:



Perfect in Hokkien


Several European languages use “to have” as a past tense marker, the so-called “perfect” (e.g. “ich habe gesagt, I’ve said, he dicho, j’ai dit, tenho ditto”). Is it the same for Hokkien? Or when do I use it?

Lin [1] calls 有 +verb the perfective aspect, an optional marker for completeness, roughly equivalent to sentences with resultative complements or with 了, e.g.:

伊下早有唱歌。 (Similar to: 伊下早歌唱)
I1 e1-tsai2 u7 tshiuN3 koa1. (Similar to: I1 e1-tsai2 koa1 tshiuN3-ho2)

阮阿姐有生双生仔。(cf. 阮阿姐双生仔生)
Gun6 a1-tsi2 u7 seN1 siang1-seN1-a2. (cf. Gun6 a1-tsi2 siang1-seN1-a2 seN1-liao6)

Bodman [2], on the other hand, describes 有 as always having an affirmative character, more like “I did do it” than “I have done it”:

汝有去无? – 有,我有去。
Li6 u7 khi3 bo? – U7, goa6 u7 khi3.

He contrasts it with 捌 (”having made an experience”, similar to verb + 过, verb + ことがある), which has a slightly different nuance, e.g.:

Goa6 bat4-khi3 oh8-tng5 thak8-tsheh4.
= I have been to school to study.

Goa6 u7 khi3 oh8-tng5 thak8-tsheh4.
= (but) I DID go to school to study.

Lin discusses the existence of an emphatic use, too, e.g. when he compares:

Goa6 u7 lim1 ang5-tsiu2.
Possibly completeness, possibly emphatic.
“I drank red wine” or “I DID drink red wine”

Goa6 lim1-liau6 ang5-tsiu2.
Focus on completion only.

In the end, it probably does not matter too much in practice – in an “optional precision” language like Hokkien, the default position is not to use a past tense marker at all:

I1 tsa7-jit8 khi3 he2-tshia1-tsam7.

If a marker is used, it is to highlight a specific aspect, which could be completeness (有,了,完,好) or emphasis that something happened (again, 有). If you find the completeness of an action important enough to use a completeness marker then that is the same as asserting it occurred :-).

=> Conclusion: only use 有 when needed to emphasize that something was completed, not as a general past tense marker like in English or Spanish.


[1] Philip T. Lin: “Taiwanese Grammar – A Concise Reference”
[2] Nicholas C. Bodman: “Spoken Amoy Hokkien”