[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode 3.0 and Mule-UCS



Kenichi Handa <handa@xxxxxxxxx> writes:

> Christian Wittern <wittern@xxxxxxxxxxxxxxxxx> writes:
> > I just got a UTF-8 encoded file that contains some of the new CJK
> > characters that have been added to Unicode about a year ago. 

If you don't mind, could you please explain more about the
characters concretely.   Are they from Extension-A or Extension-B?

> > The characters in question get silently converted to "?", without
> > notifying me.
> 
> > I am using Mule-UCS 0.82 with FSF Emacs 20.7 on Windows 2k.
> 
> > Is this a known problem? Is there a workaround?  It is quite obvious
> > that this situation is very unsatisfying, because I can not save the
> > file without destroying its contents and am not even informed of this fact!
> 
> I don't know if the latest Mule-UCS still has this problem
> or not.  But, with Emacs 21, Mule-UCS should be able to
> decode such a character into a sequence of eight-bit-control
> and eight-bit-graphic characters.  Then, those should be
> able to written back to the original UTF-8 encoding.

IMO, such problems should be solved by introducing new charsets
to Emacs, but unfortunately the current Emacs does not have enough
charset space to cover Extension-A and Extension-B.

I would not like to represent untranslated Unicode characters by
eight-bit-* charsets, because eight-bit-control and eight-bit-graphic
charset cannot specify a character, they can oonly binary octet stream,
and Unicode character should not be interpreted as binary.

However, also I regard the current behavior as a problem.

Handa-san, how about translating such character into `?'(or anything else)
and setting 'ucs-codepoint text property to it instead of converting
eight-bit-* characters?

from himi