[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Mule-UCS 0.84 (KOUGETSUDAI) release.
In message [mule:03290], on Fri, 19 Jul 2002,
MIYASHITA Hisashi <himi@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > I found these patches:
> >
> > ftp://dlpx1.dl.ac.uk/fx/emacs/Mule/mule-ucs-0.84.diff
> > http://emacs-w3m.namazu.org/ml/msg03630.html
> > http://ftp.debian.org/debian/pool/main/m/mule-ucs/mule-ucs_0.84-11.diff.gz
> >
> > I wish Mule-UCS 0.85 were released.
>
> O.K. But could you please integrate these patches and
> propose the overall change? I'll check and incorporate it, and
> then release the next version.
I've integrated the above patches.
(But I avoided the `lao' feature because it doesn't work on my
system. I put comment marks, ";;;xxx".)
--
Tatsuya Kinoshita
diff -ur mule-ucs-0.84.orig/lisp/ChangeLog mule-ucs-0.84/lisp/ChangeLog
--- mule-ucs-0.84.orig/lisp/ChangeLog Fri Apr 13 16:00:36 2001
+++ mule-ucs-0.84/lisp/ChangeLog Fri Jul 19 19:58:08 2002
@@ -1,3 +1,32 @@
+2002-07-19 Tatsuya Kinoshita <tats@xxxxxxxxxx>
+
+ * un-trbase.el: Avoid lao.
+ * un-define.el: Ditto.
+
+ * un-define.el: `)' fix for coding-system-definition. (Refer to
+ Debian mule-ucs.)
+
+ * utf.el (utf-8-ccl-decode): Handling malformed UTF-8. (Refer to
+ Debian mule-ucs, http://packages.debian.org/mule-ucs, maintained
+ by Takuo KITAME.)
+ (unicode-replace-char): New constant variable.
+
+2002-03-26 Dave Love <fx@xxxxxxx>
+
+ * un-define.el (un-define-post-read-conversion-charsets-alist):
+ Conditionally include lao-post-read-conversion,
+ in-is13194-post-read-conversion.
+ (un-define-pre-write-conversion-charsets-alist): Conditionally
+ include in-is13194-pre-write-conversion, lao-pre-write-conversion.
+ (un-define): In Mule 5, don't do explicit non-unix eol variants.
+
+2001-11-10 Dave Love <fx@xxxxxxx>
+
+ * un-define.el (unicode-basic-translation-charset-order-list): Add
+ lao.
+
+ * un-trbase.el (unicode-charset-library-alist): Add lao.
+
2001-04-13 MIYASHITA Hisashi <himi@xxxxxxxx>
* Mule-UCS 0.84 (KOUGETSUDAI) Release.
diff -ur mule-ucs-0.84.orig/lisp/jisx0213/ChangeLog mule-ucs-0.84/lisp/jisx0213/ChangeLog
--- mule-ucs-0.84.orig/lisp/jisx0213/ChangeLog Tue Nov 21 10:34:31 2000
+++ mule-ucs-0.84/lisp/jisx0213/ChangeLog Fri Jul 19 19:58:08 2002
@@ -1,3 +1,13 @@
+2002-07-19 Tatsuya Kinoshita <tats@xxxxxxxxxx>
+
+ * x0213-csys.el: Use eval-when-compile for (require 'cl). (Refer
+ to [emacs-w3m:03635], http://emacs-w3m.namazu.org/ml/msg03630.html,
+ written by Katsumi Yamaoka.)
+
+ * x0213-cdef.el: Put `x-charset-registry' property of
+ japanese-jisx0213-1 and japanese-jisx0213-2. (Refer to Debian
+ mule-ucs.)
+
2000-11-21 MIYASHITA Hisashi <himi@xxxxxxxx>
* x0213-csys.el (x0213-csys): Define shift_jisx0213
diff -ur mule-ucs-0.84.orig/lisp/jisx0213/x0213-cdef.el mule-ucs-0.84/lisp/jisx0213/x0213-cdef.el
--- mule-ucs-0.84.orig/lisp/jisx0213/x0213-cdef.el Wed Mar 8 21:09:48 2000
+++ mule-ucs-0.84/lisp/jisx0213/x0213-cdef.el Fri Jul 19 19:58:08 2002
@@ -25,9 +25,17 @@
(define-charset 151 'japanese-jisx0213-1
[2 94 2 0 ?O 0 "JISX0213-1" "JISX0213-1" "JISX0213-1 (Japanese)"]))
+(if (eq window-system 'x)
+ (put-charset-property 'japanese-jisx0213-1
+ 'x-charset-registry "JISX0213-1"))
+
(if (not (charsetp 'japanese-jisx0213-2))
(define-charset 254 'japanese-jisx0213-2
[2 94 2 0 ?P 0 "JISX0213-2" "JISX0213-2" "JISX0213-2 (Japanese)"]))
+
+(if (eq window-system 'x)
+ (put-charset-property 'japanese-jisx0213-2
+ 'x-charset-registry "JISX0213-2"))
(set-language-info "Japanese" 'charset
'(japanese-jisx0208 japanese-jisx0208-1978
diff -ur mule-ucs-0.84.orig/lisp/jisx0213/x0213-csys.el mule-ucs-0.84/lisp/jisx0213/x0213-csys.el
--- mule-ucs-0.84.orig/lisp/jisx0213/x0213-csys.el Tue Nov 21 10:34:27 2000
+++ mule-ucs-0.84/lisp/jisx0213/x0213-csys.el Fri Jul 19 19:58:08 2002
@@ -9,7 +9,7 @@
;; This program defines coding-system described in JIS X 0213 standard.
-(require 'cl)
+(eval-when-compile (require 'cl))
(require 'x0213-cdef)
(eval-when-compile
diff -ur mule-ucs-0.84.orig/lisp/un-define.el mule-ucs-0.84/lisp/un-define.el
--- mule-ucs-0.84.orig/lisp/un-define.el Wed Mar 7 07:41:38 2001
+++ mule-ucs-0.84/lisp/un-define.el Fri Jul 19 19:58:08 2002
@@ -55,21 +55,34 @@
'((thai-tis620 . thai-post-read-conversion))
(if (fboundp (function tibetan-post-read-conversion))
'((tibetan . tibetan-post-read-conversion)))
+ (if (fboundp 'lao-post-read-conversion)
+ '((lao . lao-post-read-conversion)))
;; in-is13194-devanagari-post-read-conversion does not work correctly.
;; I disabled the below line.
;; '((indian-is13194 . in-is13194-devanagari-post-read-conversion)))
- ))
+ ;; This is from post-Emacs 21.1.
+ (if (fboundp 'in-is13194-post-read-conversion)
+ ;; Post-Emacs 21.1 Unicode-based Indian implementation.
+ '((indian-is13194 . in-is13194-post-read-conversion)))
+ ))
(defvar un-define-pre-write-conversion-charsets-alist
(append
;; Disabled because read-multibyte-character
;; decompose composite characters
;; '((thai-tis620 . thai-pre-write-conversion))
- '((indian-is13194 . in-is13194-devanagari-pre-write-conversion)
- (indian-1-column . in-is13194-devanagari-pre-write-conversion)
- (indian-2-column . in-is13194-devanagari-pre-write-conversion))
+ (if (fboundp 'in-is13194-pre-write-conversion)
+ ;; Post-Emacs 21.1 Unicode-based Indian implementation.
+ '((indian-is13194 . in-is13194-pre-write-conversion)
+ (indian-2-column . in-is13194-pre-write-conversion))
+ '((indian-is13194 . in-is13194-devanagari-pre-write-conversion)
+ (indian-1-column . in-is13194-devanagari-pre-write-conversion)
+ (indian-2-column . in-is13194-devanagari-pre-write-conversion)))
(if (fboundp (function tibetan-pre-write-canonicalize-for-unicode))
- '((tibetan . tibetan-pre-write-canonicalize-for-unicode)))))
+ '((tibetan . tibetan-pre-write-canonicalize-for-unicode)))
+ ;; Post Emacs 21.1:
+ (if (fboundp 'lao-pre-write-conversion)
+ '((lao . lao-pre-write-conversion)))))
(defun un-define-post-read-conversion (len)
(if un-define-enable-buffer-conversion
@@ -147,6 +160,7 @@
ethiopic
indian-is13194
chinese-sisheng
+;;;xxx lao
vietnamese-viscii-lower
vietnamese-viscii-upper)
(if (fboundp
@@ -610,13 +624,21 @@
(mapcar
(lambda (x)
- (mapcar
- (lambda (y)
- (mucs-define-coding-system
- (nth 0 y) (nth 1 y) (nth 2 y)
- (nth 3 y) (nth 4 y) (nth 5 y) (nth 6 y))
- (coding-system-put (car y) 'alias-coding-systems (list (car x))))
- (cdr x)))
+ (if (fboundp 'register-char-codings)
+ ;; Mule 5, where we don't need the eol-type specified and
+ ;; register-char-codings may be very slow for these coding
+ ;; system definitions.
+ (let ((y (cadr x)))
+ (mucs-define-coding-system
+ (car x) (nth 1 y) (nth 2 y)
+ (nth 3 y) (nth 4 y) (nth 5 y)))
+ (mapcar
+ (lambda (y)
+ (mucs-define-coding-system
+ (nth 0 y) (nth 1 y) (nth 2 y)
+ (nth 3 y) (nth 4 y) (nth 5 y) (nth 6 y))
+ (coding-system-put (car y) 'alias-coding-systems (list (car x))))
+ (cdr x))))
`((utf-8
(utf-8-unix
?u "UTF-8 coding system"
diff -ur mule-ucs-0.84.orig/lisp/un-trbase.el mule-ucs-0.84/lisp/un-trbase.el
--- mule-ucs-0.84.orig/lisp/un-trbase.el Wed Jan 31 14:44:19 2001
+++ mule-ucs-0.84/lisp/un-trbase.el Fri Jul 19 19:58:08 2002
@@ -116,6 +116,7 @@
(chinese-sisheng . usisheng)
(vietnamese-viscii-lower . uviscii)
(vietnamese-viscii-upper . uviscii)
+;;;xxx (lao . ulao)
(tibetan . utibetan)))
(defun require-unicode-charset-data (charset)
diff -ur mule-ucs-0.84.orig/lisp/utf.el mule-ucs-0.84/lisp/utf.el
--- mule-ucs-0.84.orig/lisp/utf.el Tue Oct 3 22:30:59 2000
+++ mule-ucs-0.84/lisp/utf.el Fri Jul 19 19:58:08 2002
@@ -91,31 +91,120 @@
(write (((r0 >> 6) & ?\x3F) | ?\x80))
(write ((r0 & ?\x3f) | ?\x80))))))))))
+;; Unicode replacement character, "used to replace incoming characters
+;; whose values are unknown or unrepresentable in Unicode."
+(defconst unicode-replace-char (cn "0xFFFD"))
+
+;; See Markus Kuhn's notes on handling malformed UTF-8:
+;; <URL:http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples/UTF-8-test.txt>
(defvar utf-8-ccl-decode
- `((read-if (r0 >= ?\x80)
- ((if (r0 < ?\xE0)
- ((read r4)
- (r4 &= ?\x3F)
- (r0 = (((r0 & ?\x1F) << 6) | r4)))
- (if (r0 < ?\xF0)
- ((read r4 r6)
- (r4 = ((r4 & ?\x3F) << 6))
- (r6 &= ?\x3F)
- (r0 = ((((r0 & ?\x0F) << 12) | r4) | r6)))
- (if (r0 < ?\xF8)
- ((read r1 r4 r6)
- (r1 = ((r1 & ?\x3F) << 12))
- (r4 = ((r4 & ?\x3F) << 6))
- (r6 &= ?\x3F)
- (r0 = (((((r0 & ?\x07) << 18) | r1) | r4) | r6)))
- (if (r0 < ?\xFC)
-;;;; MUCS can't read any numbers lager than 24bit
- ((read r0 r1 r4 r6)
- (r1 = ((r1 & ?\x3F) << 12))
- (r4 = ((r4 & ?\x3F) << 6))
- (r6 &= ?\x3F)
- (r0 = (((((r0 & ?\x3F) << 18) | r1) | r4) | r6)))
- (r0 = 0)))))))))
+ `((read-if (r0 >= ?\x80)
+ (if (r0 < ?\xC0)
+ ;; Unexpected continuation byte
+ (r0 = ,unicode-replace-char)
+ (if (r0 < ?\xE0)
+ ;; 2-byte sequence
+ ((r0 = ((r0 & ?\x1F) << 6))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 = (r0 | r4))
+ (if (r0 < ?\x80)
+ ;; over-long sequence
+ (r0 = ,unicode-replace-char)))))
+ (if (r0 < ?\xF0)
+ ;; 3-byte sequence
+ ((r0 = ((r0 & ?\x0F) << 12))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= (r4 << 6))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= r4)
+ (if (r0 < ,(cn "0x800"))
+ ;; over-long sequence
+ (r0 = ,unicode-replace-char)))))))
+ (if (r0 < ?\xF8)
+ ;; 4-byte sequence
+ ((r0 = ((r0 & ?\x07) << 18))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= (r4 << 12))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= (r4 << 6))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= r4)
+ (if (r0 < ,(cn "0x10000"))
+ ;; over-long sequence
+ (r0 = ,unicode-replace-char)))))))))
+ (if (r0 < ?\xFC)
+ ;; 5-byte sequence
+ ((r0 = ((r0 & ?\x03) << 24))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= (r4 << 18))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= (r4 << 12))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= (r4 << 6))
+ (read r4)
+ (r4 ^= ?\x80)
+ (if (r4 >= ?\x40)
+ ;; not a continuation byte
+ (r0 = ,unicode-replace-char)
+ ((r0 |= r4)
+ (if (r0 < ,(cn "0x200000"))
+ ;; over-long sequence
+ (r0 = ,unicode-replace-char)))))))))))
+ (if (r0 < ?\xFE)
+ ;; 6-byte sequence - not supported, but
+ ;; we must read all the bytes, and check
+ ;; them.
+ ((r0 = ,unicode-replace-char)
+ (read r4)
+ (if ((r4 & ?\xC0) == ?\x80)
+ ;; 1st continuation byte
+ ((read r4)
+ (if ((r4 & ?\xC0) == ?\x80)
+ ;; 2nd continuation byte
+ ((read r4)
+ (if ((r4 & ?\xC0) == ?\x80)
+ ;; 3rd continuation byte
+ ((read r4)
+ (if ((r4 & ?\xC0) == ?\x80)
+ ;; 4th continuation byte
+ (read r4)))))))))
+ (r0 = ,unicode-replace-char))))))))))
(mucs-type-register-serialization
'ucs-generic