DNSO Mailling lists archives


<<< Chronological Index >>>    <<< Thread Index >>>

[nc-idn] Briefing document on IDN

This document has been presented at CENTR GA in Chateau d'Esclimont, 
21-22 February 2002.
PDF in http://www.dnso.org/dnso/notes/International-Domain-Names-EP.pdf


International Domain Names

Briefing draft document for discussion

Elisabeth Porteneuve, 19 February 2002


The International Domain Names issue come to lights less than 
two years ago, as the resultant of several actions:

   1. Competition by gTLD Registries towards new customers 
      (cf. ICANN Board resolutions from September 2000, 

   2. Natural need of people from the worldwide Internet asking 
      to facilitate their access to the Internet domain names.

   3. General dissatisfaction of the worldwide Internet with 
      ICANN and its incapability to became international body, 
      triggering off a strong reactions from various horizon, 
      including requests for international characters in domain 

Subsequently the IETF has been asked to make a technical study 
on International Domain Names introduction into the DNS.

Background to International Domain Names

1. ASCII subset "LDH"

   Today domain names code specifications limit the permissible 
   code points to a restricted subset of 38 signs: the letters 
   a-z (upper and lower case alike, 26 signs), the digits 0-9, 
   the hyphen-minus "-" (so called "LDH"), plus the 
   label-separating period (with additional rules such as no minus 
   at the beginning or at the end of a label).

2. Unicode

   The Unicode is the only existing, very recent, table of 
   international character sets produced originally by printer's 
   industry in late 1980's. 

   The origins of Unicode are rooted in works on unified Han, 
   a subset of Chinese, Japanese and Korean (CJK) characters, 
   which (1) have identical internal computer code point, 
   (2) print in Chinese, Japanese or Korean design, according to 
   a language context, (3) may have a similar meaning or not, 
   according to a language context. 

   An important characteristic of three CJK languages is that 
   they do not use a small set of signs called alphabet, but 
   rather write ideographs, each of them being a concept or a word. 

   There is more than one hundred thousands of CJK ideographs.

   The Unicode consortium spent ten years on developing unified 
   Han for printer's industry.

   The Unicode tables allows for up to 4 octets per character.

   The Unicode Consortium tables are equivalent to ISO 10646 tables. 
   The ISO Working Group responsible for ISO/IEC 10646 is JTC1/SC2/WG2.

3. The Traditional Chinese vs Simplified Chinese issue:

   After the establishment of the People's Republic of China in 1949, 
   and to ensure the coherency of a large country with the largest 
   world population, the new regime led by Mao Zedong and Zhou Enlai 
   announced that character simplification was a high priority task. 
   As a result of works on Chinese simplification, various written 
   language reforms were undertaken, the most important of which 
   include: the development of a standardized translation to Latin 
   system known as pinyin (cf. replacement of the old name Pekin 
   by Beijing), and the simplification of thousands of ideographs forms. 

   Since the promulgation of simplification plan of Chinese characters 
   in 1956, simplified Chinese has been used in Mainland of China; 
   the plan has greatly promoted the spread and application of the 
   Chinese language in people's daily life, meanwhile, the 
   traditional Chinese still extensively exist in the social life, 
   for its long history and artistic value.

   After the reunification of Hong Kong to the Mainland of China 
   in 1997 and Macao in 1999, both simplified and traditional 
   Chinese characters are used widely in the these two regions. 
   Singapore mainly uses simplified Chinese. People in Taiwan region 
   have always used traditional Chinese characters. Korea and Japan 
   mainly use traditional Chinese.

4. The IETF works on International Domain Names

   a. Based on Unicode, because there is nothing else.

   b. Technical scope - expand today "LDH" 38 characters into several 
      tens of thousands of code points.

   c. Led to a discovery of many problems, two of them are listed below:

      * Combinatory effects which may have a dramatic impact on 
        domain names. At the extreme stage both an end user and 
        a business company being unable to communicate without 
        knowing precisely which language code points were used 
        to print business cards or publish web sites - in other 
        words neither an end user nor a business company can use 
        a printed information and enter it into browser without 
        knowing which scripts must be used.

      * Mutual incompatibility in Unicode between unified Han 
        (developed for printers) and Chinese language including 
        Simplified Chinese.

Summary of problems and political dilemmas:

1. Chinese Unicode problem:

   a. It is mutually impossible to satisfy TC/SC _and_ unified 
      Han (all engineers worldwide affirm it)

   b. Chinese are signatories to ISO10646 (ISO documents are signed 
      by official representatives), therefore some presume, in all 
      due deference to governments, that Chinese decision gives 
      advantage to unified Han over TC/SC problem. But many forget 
      that initially Unicode and unified Han were made for printer's 
      industry, in 1980's, at the time when computer memory and 
      processing were slow and requested for a lot of ingenuity 
      to allow new features. Consequently Chinese could endorse 
      ISO10646 for printers, and may be against its usage for 
      domain names. On the other hand there is no doubt that Chinese 
      endorse their national work on Chinese language Simplification, 
      and the fact that one billion nation uses it now.

   c. The IETF work on Unicode usage for domain names demonstrates 
      a clash between Chinese language on one side and Korean and 
      Japanese on another side. In other words accepting Unicode 
      for domain names is an equally bad choice between supporting 
      Korean and Japanese against Chinese, or an opposite. 
      No one of these is neither wise nor appropriate.

2. Latin - Cyrillic - Greek problem:

   a. If the usage of mixed letters from various alphabets is allowed 
      - and the IETF works on Unicode characters cannot exclude it - 
      then, there will be no more any unambiguous printed URL. 
      The mixed similarly appearance while different code points 
      will create a terrible confusion to consumers, and may kill 
      any hope for safe electronic commerce. 

   b. The combinatory possibilities will increase by factor of 
      hundred or thousand a domain names cost to some companies

   c. The Latin - Cyrillic - Greek problem is one example among 
      others. The combinatory effects are similar for CJK as well.

3. The UDRP for IDN problem:

   a. The "language" for IDN in gTLD is undefined. 

   b. The printed URLs (paper or screen) are in general case 
      undefined because a multitude of characters in different 
      scripts have the same printed shape. As un example a business 
      card with abc.com in IDNs does not provide for unilateral 
      guessing of a company. In case of any trading services a 
      consumer can be easily abused, and he will be not able 
      to demonstrate so.

4. Not enough work on languages:

   a. The Unicode is a recent, unique, pot pourri, initalially 
      defined for printer's industry, gathering not only languages, 
      but anything which may be printed.

   b. But there is nothing else. Nothing dedicated to languages 
      for international domain names.

5. Tradeoffs

   a. Do we consider that from a user perspective an alternate roots 
      are more confusing that IDNs in gTLDs ?

   b. If trademark industry or business or electronic commerce 
      industry feels in danger with IDNs, odds are there will be 
      pressure to create as many "LDH" TLDs as companies, and allow 
      them to escape this way from combinatory effects and confusion.

Questions :

   1. Do we agree the Unicode is unsuitable for Internet Domain 
      Names on global Internet. ?

   2. If yes, what to do now ?


1. The Internationalized Domain Names IETF WG 
   The IDN IETF WG Web site http://www.i-d-n.net/ 
   The IDN IETF mailing list archives 

2. IETF drafts related to IDN: 
   a. [AMC-ACE-M] Adam Costello (4 Sep 2001) 
      The choice of AMC-ACE encoding got a significant support within
      Internet industry. Subsequently its name became PUNYCODE: 
   b. [PUNYCODE] Adam Costello (10 Jan 2002) 
   c. [IDNA] P. Faltstrom, "Internationalizing Domain Names in Applications" 
      (11 Jan 2002) 
   d. [NAMEPREP] Paul Hoffman and Marc Blanchet, "Stringprep Profile for 
      Internationalized Host Names" (17 Jan 2002) 
   e. [TC/SC] XiaoDong Lee, Hsu Nai-Wen, Deng Xiang, Erin Chen, Zhang Hong, 
      Sun Guonian, "Traditional and Simplified Chinese Conversion" 
      (16 Nov 2001) 

3. IETF RFCs: 
   a. [RFC3066] H. Alvestrand, "Tags for Identification of Languages" 

4. Unicode Consortium: 
   [UNICODE] The Unicode Standard, Version 3.1.0: The Unicode Consortium. 

5. The Pitfalls and Complexities of Chinese to Chinese Conversion, 
   J.Halpern and J.Kerman, http://www.cjk.org/cjk/c2c/c2cbasis.htm 

6. ICANN Board IDN Committee: 

7. Names Council IDN Task Force: 

<<< Chronological Index >>>    <<< Thread Index >>>