[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mule-UCS 0.84 (KOUGETSUDAI) release.



In message [mule:03290], on Fri, 19 Jul 2002,
MIYASHITA Hisashi <himi@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

> > I found these patches:
> >
> >   ftp://dlpx1.dl.ac.uk/fx/emacs/Mule/mule-ucs-0.84.diff
> >   http://emacs-w3m.namazu.org/ml/msg03630.html
> >   http://ftp.debian.org/debian/pool/main/m/mule-ucs/mule-ucs_0.84-11.diff.gz
> >
> > I wish Mule-UCS 0.85 were released.
> 
> O.K.  But could you please integrate these patches and
> propose the overall change?  I'll check and incorporate it, and
> then release the next version.

I've integrated the above patches.

(But I avoided the `lao' feature because it doesn't work on my
system.  I put comment marks, ";;;xxx".)

-- 
Tatsuya Kinoshita
diff -ur mule-ucs-0.84.orig/lisp/ChangeLog mule-ucs-0.84/lisp/ChangeLog
--- mule-ucs-0.84.orig/lisp/ChangeLog	Fri Apr 13 16:00:36 2001
+++ mule-ucs-0.84/lisp/ChangeLog	Fri Jul 19 19:58:08 2002
@@ -1,3 +1,32 @@
+2002-07-19  Tatsuya Kinoshita  <tats@xxxxxxxxxx>
+
+	* un-trbase.el: Avoid lao.
+	* un-define.el: Ditto.
+
+	* un-define.el: `)' fix for coding-system-definition.  (Refer to
+	Debian mule-ucs.)
+
+	* utf.el (utf-8-ccl-decode): Handling malformed	UTF-8.  (Refer to
+	Debian mule-ucs, http://packages.debian.org/mule-ucs, maintained
+	by Takuo KITAME.)
+	(unicode-replace-char): New constant variable.
+
+2002-03-26  Dave Love  <fx@xxxxxxx>
+
+	* un-define.el (un-define-post-read-conversion-charsets-alist):
+	Conditionally include lao-post-read-conversion,
+	in-is13194-post-read-conversion.
+	(un-define-pre-write-conversion-charsets-alist): Conditionally
+	include in-is13194-pre-write-conversion, lao-pre-write-conversion.
+	(un-define): In Mule 5, don't do explicit non-unix eol variants.
+
+2001-11-10  Dave Love  <fx@xxxxxxx>
+
+	* un-define.el (unicode-basic-translation-charset-order-list): Add
+	lao.
+
+	* un-trbase.el (unicode-charset-library-alist): Add lao.
+
 2001-04-13  MIYASHITA Hisashi  <himi@xxxxxxxx>
 
 	* Mule-UCS 0.84 (KOUGETSUDAI) Release.
diff -ur mule-ucs-0.84.orig/lisp/jisx0213/ChangeLog mule-ucs-0.84/lisp/jisx0213/ChangeLog
--- mule-ucs-0.84.orig/lisp/jisx0213/ChangeLog	Tue Nov 21 10:34:31 2000
+++ mule-ucs-0.84/lisp/jisx0213/ChangeLog	Fri Jul 19 19:58:08 2002
@@ -1,3 +1,13 @@
+2002-07-19  Tatsuya Kinoshita  <tats@xxxxxxxxxx>
+
+	* x0213-csys.el: Use eval-when-compile for (require 'cl).  (Refer
+	to [emacs-w3m:03635], http://emacs-w3m.namazu.org/ml/msg03630.html,
+	written by Katsumi Yamaoka.)
+
+	* x0213-cdef.el: Put `x-charset-registry' property of
+	japanese-jisx0213-1 and japanese-jisx0213-2.  (Refer to Debian
+	mule-ucs.)
+	
 2000-11-21  MIYASHITA Hisashi  <himi@xxxxxxxx>
 
 	* x0213-csys.el (x0213-csys): Define shift_jisx0213
diff -ur mule-ucs-0.84.orig/lisp/jisx0213/x0213-cdef.el mule-ucs-0.84/lisp/jisx0213/x0213-cdef.el
--- mule-ucs-0.84.orig/lisp/jisx0213/x0213-cdef.el	Wed Mar  8 21:09:48 2000
+++ mule-ucs-0.84/lisp/jisx0213/x0213-cdef.el	Fri Jul 19 19:58:08 2002
@@ -25,9 +25,17 @@
     (define-charset 151 'japanese-jisx0213-1
       [2 94 2 0 ?O 0 "JISX0213-1" "JISX0213-1" "JISX0213-1 (Japanese)"]))
 
+(if (eq window-system 'x)
+    (put-charset-property 'japanese-jisx0213-1
+			  'x-charset-registry "JISX0213-1"))
+
 (if (not (charsetp 'japanese-jisx0213-2))
     (define-charset 254 'japanese-jisx0213-2
       [2 94 2 0 ?P 0 "JISX0213-2" "JISX0213-2" "JISX0213-2 (Japanese)"]))
+
+(if (eq window-system 'x)
+    (put-charset-property 'japanese-jisx0213-2
+			  'x-charset-registry "JISX0213-2"))
 
 (set-language-info "Japanese" 'charset
                    '(japanese-jisx0208 japanese-jisx0208-1978
diff -ur mule-ucs-0.84.orig/lisp/jisx0213/x0213-csys.el mule-ucs-0.84/lisp/jisx0213/x0213-csys.el
--- mule-ucs-0.84.orig/lisp/jisx0213/x0213-csys.el	Tue Nov 21 10:34:27 2000
+++ mule-ucs-0.84/lisp/jisx0213/x0213-csys.el	Fri Jul 19 19:58:08 2002
@@ -9,7 +9,7 @@
 
 ;; This program defines coding-system described in JIS X 0213 standard.
 
-(require 'cl)
+(eval-when-compile (require 'cl))
 (require 'x0213-cdef)
 
 (eval-when-compile
diff -ur mule-ucs-0.84.orig/lisp/un-define.el mule-ucs-0.84/lisp/un-define.el
--- mule-ucs-0.84.orig/lisp/un-define.el	Wed Mar  7 07:41:38 2001
+++ mule-ucs-0.84/lisp/un-define.el	Fri Jul 19 19:58:08 2002
@@ -55,21 +55,34 @@
    '((thai-tis620 . thai-post-read-conversion))
    (if (fboundp (function tibetan-post-read-conversion))
        '((tibetan . tibetan-post-read-conversion)))
+   (if (fboundp 'lao-post-read-conversion)
+       '((lao . lao-post-read-conversion)))
     ;; in-is13194-devanagari-post-read-conversion does not work correctly.
     ;; I disabled the below line.
     ;; '((indian-is13194 . in-is13194-devanagari-post-read-conversion)))
-    ))
+   ;; This is from post-Emacs 21.1.
+   (if (fboundp 'in-is13194-post-read-conversion)
+       ;; Post-Emacs 21.1 Unicode-based Indian implementation.
+       '((indian-is13194 . in-is13194-post-read-conversion)))
+   ))
 
 (defvar un-define-pre-write-conversion-charsets-alist
   (append
    ;; Disabled because read-multibyte-character
    ;; decompose composite characters
    ;; '((thai-tis620 . thai-pre-write-conversion))
-   '((indian-is13194 . in-is13194-devanagari-pre-write-conversion)
-     (indian-1-column . in-is13194-devanagari-pre-write-conversion)
-     (indian-2-column . in-is13194-devanagari-pre-write-conversion))
+   (if (fboundp 'in-is13194-pre-write-conversion)
+       ;; Post-Emacs 21.1 Unicode-based Indian implementation.
+       '((indian-is13194 . in-is13194-pre-write-conversion)
+	  (indian-2-column . in-is13194-pre-write-conversion))
+     '((indian-is13194 . in-is13194-devanagari-pre-write-conversion)
+       (indian-1-column . in-is13194-devanagari-pre-write-conversion)
+       (indian-2-column . in-is13194-devanagari-pre-write-conversion)))
    (if (fboundp (function tibetan-pre-write-canonicalize-for-unicode))
-       '((tibetan . tibetan-pre-write-canonicalize-for-unicode)))))
+       '((tibetan . tibetan-pre-write-canonicalize-for-unicode)))
+   ;; Post Emacs 21.1:
+   (if (fboundp 'lao-pre-write-conversion)
+       '((lao . lao-pre-write-conversion)))))
 
 (defun un-define-post-read-conversion (len)
   (if un-define-enable-buffer-conversion
@@ -147,6 +160,7 @@
 	      ethiopic
 	      indian-is13194
 	      chinese-sisheng
+;;;xxx	      lao
 	      vietnamese-viscii-lower
 	      vietnamese-viscii-upper)
 	    (if (fboundp
@@ -610,13 +624,21 @@
 
  (mapcar
   (lambda (x)
-    (mapcar
-     (lambda (y)
-       (mucs-define-coding-system
-	(nth 0 y) (nth 1 y) (nth 2 y)
-	(nth 3 y) (nth 4 y) (nth 5 y) (nth 6 y))
-       (coding-system-put (car y) 'alias-coding-systems (list (car x))))
-     (cdr x)))
+    (if (fboundp 'register-char-codings)
+	;; Mule 5, where we don't need the eol-type specified and
+	;; register-char-codings may be very slow for these coding
+	;; system definitions.
+	(let ((y (cadr x)))
+	  (mucs-define-coding-system
+	   (car x) (nth 1 y) (nth 2 y)
+	   (nth 3 y) (nth 4 y) (nth 5 y)))
+      (mapcar
+       (lambda (y)
+	 (mucs-define-coding-system
+	  (nth 0 y) (nth 1 y) (nth 2 y)
+	  (nth 3 y) (nth 4 y) (nth 5 y) (nth 6 y))
+	 (coding-system-put (car y) 'alias-coding-systems (list (car x))))
+       (cdr x))))
   `((utf-8
      (utf-8-unix
       ?u "UTF-8 coding system"
diff -ur mule-ucs-0.84.orig/lisp/un-trbase.el mule-ucs-0.84/lisp/un-trbase.el
--- mule-ucs-0.84.orig/lisp/un-trbase.el	Wed Jan 31 14:44:19 2001
+++ mule-ucs-0.84/lisp/un-trbase.el	Fri Jul 19 19:58:08 2002
@@ -116,6 +116,7 @@
     (chinese-sisheng . usisheng)
     (vietnamese-viscii-lower . uviscii)
     (vietnamese-viscii-upper . uviscii)
+;;;xxx    (lao . ulao)
     (tibetan . utibetan)))
 
 (defun require-unicode-charset-data (charset)
diff -ur mule-ucs-0.84.orig/lisp/utf.el mule-ucs-0.84/lisp/utf.el
--- mule-ucs-0.84.orig/lisp/utf.el	Tue Oct  3 22:30:59 2000
+++ mule-ucs-0.84/lisp/utf.el	Fri Jul 19 19:58:08 2002
@@ -91,31 +91,120 @@
 	       (write (((r0 >> 6) & ?\x3F) | ?\x80))
 	       (write ((r0 & ?\x3f) | ?\x80))))))))))
 
+;; Unicode replacement character, "used to replace incoming characters
+;; whose values are unknown or unrepresentable in Unicode."
+(defconst unicode-replace-char (cn "0xFFFD"))
+
+;; See Markus Kuhn's notes on handling malformed UTF-8:
+;; <URL:http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples/UTF-8-test.txt>
 (defvar utf-8-ccl-decode
-  `((read-if (r0 >= ?\x80)
-	((if (r0 < ?\xE0)
-	     ((read r4)
-	      (r4 &= ?\x3F)
-	      (r0 = (((r0 & ?\x1F) << 6) | r4)))
-	   (if (r0 < ?\xF0)
-	       ((read r4 r6)
-		(r4 = ((r4  & ?\x3F) << 6))
-		(r6 &= ?\x3F)
-		(r0 = ((((r0 & ?\x0F) << 12) | r4) | r6)))
-	     (if (r0 < ?\xF8)
-		 ((read r1 r4 r6)
-		  (r1 = ((r1  & ?\x3F) << 12))
-		  (r4 = ((r4  & ?\x3F) << 6))
-		  (r6 &= ?\x3F)
-		  (r0 = (((((r0 & ?\x07) << 18) | r1) | r4) | r6)))
-	       (if (r0 < ?\xFC)
-;;;; MUCS can't read any numbers lager than 24bit
-		   ((read r0 r1 r4 r6)
-		    (r1 = ((r1  & ?\x3F) << 12))
-		    (r4 = ((r4  & ?\x3F) << 6))
-		    (r6 &= ?\x3F)
-		    (r0 = (((((r0 & ?\x3F) << 18) | r1) | r4) | r6)))
-		 (r0 = 0)))))))))
+      `((read-if (r0 >= ?\x80)
+                 (if (r0 < ?\xC0)
+                     ;; Unexpected continuation byte
+                     (r0 = ,unicode-replace-char)
+                   (if (r0 < ?\xE0)
+                       ;; 2-byte sequence
+                       ((r0 = ((r0 & ?\x1F) << 6))
+                        (read r4)
+                        (r4 ^= ?\x80)
+                        (if (r4 >= ?\x40)
+                            ;; not a continuation byte
+                            (r0 = ,unicode-replace-char)
+                          ((r0 = (r0 | r4))
+                           (if (r0 < ?\x80)
+                               ;; over-long sequence
+                               (r0 = ,unicode-replace-char)))))
+                     (if (r0 < ?\xF0)
+                         ;; 3-byte sequence
+                         ((r0 = ((r0 & ?\x0F) << 12))
+                          (read r4)
+                          (r4 ^= ?\x80)
+                          (if (r4 >= ?\x40)
+                              ;; not a continuation byte
+                              (r0 = ,unicode-replace-char)
+                            ((r0 |= (r4 << 6))
+                             (read r4)
+                             (r4 ^= ?\x80)
+                             (if (r4 >= ?\x40)
+                                 ;; not a continuation byte
+                                 (r0 = ,unicode-replace-char)
+                               ((r0 |= r4)
+                                (if (r0 < ,(cn "0x800"))
+                                    ;; over-long sequence
+                                    (r0 = ,unicode-replace-char)))))))
+                       (if (r0 < ?\xF8)
+                           ;; 4-byte sequence
+                           ((r0 = ((r0 & ?\x07) << 18))
+                            (read r4)
+                            (r4 ^= ?\x80)
+                            (if (r4 >= ?\x40)
+                                ;; not a continuation byte
+                                (r0 = ,unicode-replace-char)
+                              ((r0 |= (r4 << 12))
+                               (read r4)
+                               (r4 ^= ?\x80)
+                               (if (r4 >= ?\x40)
+                                   ;; not a continuation byte
+                                   (r0 = ,unicode-replace-char)
+                                 ((r0 |= (r4 << 6))
+                                  (read r4)
+                                  (r4 ^= ?\x80)
+                                  (if (r4 >= ?\x40)
+                                      ;; not a continuation byte
+                                      (r0 = ,unicode-replace-char)
+                                    ((r0 |= r4)
+                                     (if (r0 < ,(cn "0x10000"))
+                                         ;; over-long sequence
+                                         (r0 = ,unicode-replace-char)))))))))
+                         (if (r0 < ?\xFC)
+                             ;; 5-byte sequence
+                             ((r0 = ((r0 & ?\x03) << 24))
+                              (read r4)
+                              (r4 ^= ?\x80)
+                              (if (r4 >= ?\x40)
+                                  ;; not a continuation byte
+                                  (r0 = ,unicode-replace-char)
+                                ((r0 |= (r4 << 18))
+                                 (read r4)
+                                 (r4 ^= ?\x80)
+                                 (if (r4 >= ?\x40)
+                                     ;; not a continuation byte
+                                     (r0 = ,unicode-replace-char)
+                                   ((r0 |= (r4 << 12))
+                                    (read r4)
+                                    (r4 ^= ?\x80)
+                                    (if (r4 >= ?\x40)
+                                        ;; not a continuation byte
+                                        (r0 = ,unicode-replace-char)
+                                      ((r0 |= (r4 << 6))
+                                       (read r4)
+                                       (r4 ^= ?\x80)
+                                       (if (r4 >= ?\x40)
+                                           ;; not a continuation byte
+                                           (r0 = ,unicode-replace-char)
+                                         ((r0 |= r4)
+                                          (if (r0 < ,(cn "0x200000"))
+                                              ;; over-long sequence
+                                              (r0 = ,unicode-replace-char)))))))))))
+                           (if (r0 < ?\xFE)
+                               ;; 6-byte sequence - not supported, but
+                               ;; we must read all the bytes, and check
+                               ;; them.
+                               ((r0 = ,unicode-replace-char)
+                                (read r4)
+                                (if ((r4 & ?\xC0) == ?\x80)
+                                    ;; 1st continuation byte
+                                    ((read r4)
+                                     (if ((r4 & ?\xC0) == ?\x80)
+                                         ;; 2nd continuation byte
+                                         ((read r4)
+                                          (if ((r4 & ?\xC0) == ?\x80)
+                                              ;; 3rd continuation byte
+                                              ((read r4)
+                                               (if ((r4 & ?\xC0) == ?\x80)
+                                                   ;; 4th continuation byte
+                                                   (read r4)))))))))
+                             (r0 = ,unicode-replace-char))))))))))
 
 (mucs-type-register-serialization
  'ucs-generic