[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode 3.0 and Mule-UCS
- To: mule@xxxxxxxx
- Subject: Re: Unicode 3.0 and Mule-UCS
- From: MIYASHITA Hisashi(宮下 尚:HIMI)<himi@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: 05 Mar 2001 23:15:18 +0900
- In-reply-to: <200103051310.WAA13483@etlken.etl.go.jp>(Kenichi Handa's message of "Mon, 5 Mar 2001 22:10:24 +0900 (JST)")
- References: <200103051310.WAA13483@etlken.etl.go.jp>
- Reply-to: mule@xxxxxxxx
- Sender: himi@xxxxxxxx
- User-agent: T-gnus/6.14.1 (based on Gnus v5.8.3) SEMI/1.13.5(Meihō) FLIM/1.13.2 (Kasanui) Emacs/20.7(i386-*-nt5.0.2195) MULE/4.1 (AOI) Meadow/1.13 Beta2 (UKIHASHI:61)
Kenichi Handa <handa@xxxxxxxxx> writes:
> Christian Wittern <wittern@xxxxxxxxxxxxxxxxx> writes:
> > I just got a UTF-8 encoded file that contains some of the new CJK
> > characters that have been added to Unicode about a year ago.
If you don't mind, could you please explain more about the
characters concretely. Are they from Extension-A or Extension-B?
> > The characters in question get silently converted to "?", without
> > notifying me.
>
> > I am using Mule-UCS 0.82 with FSF Emacs 20.7 on Windows 2k.
>
> > Is this a known problem? Is there a workaround? It is quite obvious
> > that this situation is very unsatisfying, because I can not save the
> > file without destroying its contents and am not even informed of this fact!
>
> I don't know if the latest Mule-UCS still has this problem
> or not. But, with Emacs 21, Mule-UCS should be able to
> decode such a character into a sequence of eight-bit-control
> and eight-bit-graphic characters. Then, those should be
> able to written back to the original UTF-8 encoding.
IMO, such problems should be solved by introducing new charsets
to Emacs, but unfortunately the current Emacs does not have enough
charset space to cover Extension-A and Extension-B.
I would not like to represent untranslated Unicode characters by
eight-bit-* charsets, because eight-bit-control and eight-bit-graphic
charset cannot specify a character, they can oonly binary octet stream,
and Unicode character should not be interpreted as binary.
However, also I regard the current behavior as a problem.
Handa-san, how about translating such character into `?'(or anything else)
and setting 'ucs-codepoint text property to it instead of converting
eight-bit-* characters?
from himi