Quantcast
Channel: The Nanbanjin Nikki
Viewing all articles
Browse latest Browse all 27

Jōyō kanji variants: The curious case of 叱 and [censored]

$
0
0

I’m working on a reliable, machine-readable edition of the Jōyō kanji data, and this came up. Can you spot the difference between 𠮟 and 叱? Me neither. Let’s look at the reference image:

Comparison between  and 叱 (Joyo Kanji-hyo reference image

…Welp. The left one is a left-to-right stroke stopping at the end, in the model of 七; the right one is right-to-left, sweeping at the end, as in 匕. But, still. These government people are very through, to list these minor variant glyphs of the same character.

Except these are supposed to be different characters altogether.

Let’s recap: a character is an abstract entity, and a glyph is a variation of the same character. The shapes ‘a’, ‘a‘ and ‘a‘ are different glyphs of the character LATIN SMALL LETTER A, and font designers can give us nearly infinite more. The text standard for computers, Unicode, assigns one number (“code point”) to each character, not to each glyph; glyph variations are decided by fonts.

However, in the case of Chinese characters, things get blurry. If a character had variants with significantly different shapes (such as 兑 vs. 兌), it was given one code point for each. Only very minor variations were “unified” in the same code point. Unfortunately, these minor variations tend to be bound to locales – The Japanese cross the blade in 刃, the Koreans don’t ­– which means that even the timid unification was hugely controversial. One can, of course, use their country’s version of the characters simply by choosing an appropriate font; but computers don’t always choose the appropriate font, which means that from time to time Taiwanese people would stumble upon Japanese-style glyphs with are obviously completely wrong and unnaceptable (or the other way around).

A mechanism was designed to pacify this, which is the variant forms. Special, invisible characters can be added to tell the computer which graphical variant is intended. However, most software don’t support this mechanism yet.

The Jōyō Kanji standard has a thing for telling people that the glyphs they’re using are wrong. There are two kinds of variants in the document. One are the “acceptable character forms” 許容字体. These are five characters (餌, 遡,遜 謎, and 餅) where the de facto glyphs in modern society differs from what they say it’s the standard. So the popular glyphs are listed in the table (between brackets) as acceptable. These variants are unified in Unicode, and selectable only by variation selectors (I added the relevant sequences to JoyoDB, though, again, most computers won’t display them as of 2016). If you want to try, here are them:

Variant unspecifiedStandard variantAccepted variant
U+990CU+990C,U+E0103餌󠄃U+990C,U+E0100餌󠄀
U+9061U+9061,U+E0101遡󠄁U+9061,U+E0100遡󠄀
U+905CU+905C,U+E0101遜󠄁U+905C,U+E0100遜󠄀
U+8B0EU+8B0E,U+E0101謎󠄁U+8B0E,U+E0100謎󠄀
U+9905U+9905,U+E0101餅󠄁U+9905,U+E0100餅󠄀

If they look the same to you, that’s too bad. Come back to this post in 10 years. Meanwhile, here are the reference images of what they should look like:

Standard variantAccepted variant
U+990C,U+E0103餌 󠄃U+990C,U+E0100餌󠄀
U+9061,U+E0101遡 󠄁U+9061,U+E0100遡 󠄀
U+905C,U+E0101遜 󠄁U+905C,U+E0100遜󠄀
U+8B0E,U+E0101謎 󠄁U+8B0E,U+E0100謎󠄀
U+9905,U+E0101餅 󠄁U+9905,U+E0100餅󠄀

The other kind of variant are the “popular-use character forms” 通用字体. These are non-unified characters; they got their own, distinct Unicode codepoints. Still, no one uses the recommended forms, so the Introduction gives a passing nod to the existence of the popular variants. This is related to the Japanese JIS character sets; the popular variants are the ones that were encoded in the first JIS releases, from whence they became well-established.

StandardPopular
U+5861U+586b
U+525dU+5265
U+9830U+982c

Since these are different Unicode codepoints, the difference will show up in all computers; however, they’re still graphical variations of the same fundamental Chinese character.

And then there’s 𠮟 vs. 叱: U+20B9F vs. U+53F1. At first sight it seems to be the same case as the three characters above. However, the Joyo document insists that U+53F1 is not the well-known Jōyō character with the readings shitsu and shi(karu) (“to scold”). You can see they’re distinct characters in the classic Kangxi dictionary, page 173. Here’s what they were supposed to be:

CodepointOnPhoneticKunMeaning
𠮟U+20B9Fshitsushi(karu)to scold
U+53F1ka匕(< 化)to open the mouth

What happened was that early computer practice had the shitsu/shikaru character drawn like the ka character. Ka isn’t used in modern Japanese, so no one cared. By the time they codified the distinction, people had already became used to 叱 (with a diagonal-stroked 匕) in this role. What’s more, computers were used to it; U+20B9F is a newer kind of Unicode character, outside the Basic Multilingual Plane (BMP), and software support to this day is still icky (this very blog system was giving me trouble to preserve it, and adding it to the title broke everything horribly) – not to mention the lack of font glyphs. Input methods will choose U+53F1 for shitsu or shikaru, not for ka; and they won’t bring up U+20B9F at all.

Finally, even if the Japanese standards declare that this character shape is meant for ka/”open mouth”, the Unicode standard declares that the codepoint represents shitsu/shikaru “to scold” – the only concession for the original use being the data field kHanyuPinyin, which draws from the Hànyǔ Dà Zìdiǎn dictionary.

In effect, the two characters were accidentally unified as “to scold”, with the earlier “open mouth” meaning rendered obsolete. The Joyo Kanji document recognizes this, saying that now 叱/ka has become a graphical variant (異体字) of 𠮟/shitsu.


Viewing all articles
Browse latest Browse all 27

Latest Images

Trending Articles





Latest Images