Emacs-bidi
Emacs-21 for Arabic and Hebrew


This is our first attempt to support the Arabic and Hebrew scripts under Emacs-21. It is distributed under GPL. The copyright holder is National Institute of Advanced Industrial Science and Technology (AIST). The code below is based on the GNU Emacs-21.3.50 branch and still provisional.

Install and compile

Download the tarball below and untar it in a directory.

Then issue the following commands in that order.

  $ cd emacs-bidi
  $ ./configure
  $ make

If you experience a segmentation fault during the "make" process, continue with the following commands.

  $ cd src
  $ setarch i386 -R ./temacs --batch --load loadup dump

The executable file will be created as emacs-bidi/src/emacs. Note that Emacs-21.3.50 is a development version and not an official release. We recommend you to test the executable for a while before you install it with "make install".

It has turned out that lisp/font-lock.el in Emacs-21.3.50 does not work with our composition mechanism. Thus the lisp/font-lock.el[c] files in this release are taken from Emacs-21.2. Contact us if you find any problem caused by this combination.

Fonts

Currently, emacs-bidi uses the mule-unicode-0100-24ff charset to represent Arabic characters in buffer and the mule-unicode-e000-ffff charset to display them on the screen. So you need a Unicode based font that covers those areas. For your convenience, we prepared two sample fonts. m17n-nr14.bdf is a 14pt proportional font. It contains Latin, Cyrillic, Greek and some other glyphs in addition to Arabic and Hebrew. Another font, m17n-nr20.bdf, is a 20pt proportianal font. It contains only those glyphs that are used to display Arabic and Hebrew.

To install the fonts, download the following tarball and untar it in an appropriate directory.

Then do

  $ cd m17nfonts
  $ xset +fp `pwd`

to add the new fontpath to the X server.

Then add the following lines to your ~/.emacs file. Change the string "fontset-default" to whatever if necessary.

  (set-fontset-font
   "fontset-default"
   (cons (decode-char 'ucs #x05b0) (decode-char 'ucs #x06ff))
   "-m17n-*--20-*-iso10646-1")
  (set-fontset-font
   "fontset-default"
   (cons (decode-char 'ucs #xfb2a) (decode-char 'ucs #xfbff))
   "-m17n-*--20-*-iso10646-1")
  (set-fontset-font
   "fontset-default"
   (cons (decode-char 'ucs #xfe70) (decode-char 'ucs #xfefc))
   "-m17n-*--20-*-iso10646-1")
  (set-fontset-font
   "fontset-default"
   (cons (decode-char 'ucs #x200c) (decode-char 'ucs #x200f))
   "-m17n-*--20-*-iso10646-1")

Here is a screenshot of emacs-bidi using the m17n-nr14 font.

Image showing UNIVERSAL DECLARATION OF HUMAN RIGHTS in Arabic

Starting Emacs

Run the executable file emacs-bidi/src/emacs and select one of the language environments with C-x RET l. When you select Arabic, Persian, Kazakh or Hebrew-Unicode, you will see "Bidi" in the menu bar. Turn on the first three items (Enable bidi display, Enable auto composition, and Reverse display orientation). Enable bidi display reorders bi-directional text. Enable auto composition joins Arabic letters appropriately. Reverse display orientation makes lines start from the right edge of the screen.

Input methods

Now let's type something. The following input methods are provided.

language
environment
primary input method secondary input method
Arabic arabic arabic-translit
Persian persian-isiri2901 persian-translit
Kazakh kazakh-sample0
Hebrew-Unicode hebrew-unicode

Hit C-u C-\ to specify an input method. Once an input method is specified, C-\ (without C-u) toggles the specified method and the default ASCII mode. The arabic method emulates the widely used MS Arabic keyboard layout. The persian-isiri2901 method emulates the ISIRI 2901 standard. Both arabic-translit and persian-translit are transliteration methods derived from ArabTeX. The kazakh-sample0 method is just a quick hack. The hebrew-unicode method generates Hebrew characters in the mule-unicode-0100-24ff charset and is different from the hebrew method that uses the traditional ISO 8859/8 charset. When an input method is active, M-x quail-help RET displays the key bindings no matter which one is in use.

Hebrew points

When "Enable auto composition" is on, Emacs combines Hebrew points (U+05B0..U+05C4) with the preceding Hebrew letter. If a precomposed form is prepared in U+FB2A..U+FB4E, the corresponding character in the mule-unicode-e000-ffff charset is used for display.

Here is a screenshot.

Image showing Hebrew sentences with points.

Mailing list

There is a mailing list to discuss Emacs support for bi-directional text. You can subscribe to the list by filling the form in the following web page. A large amount of archive is also available.

The emacs-bidi mailing list
http://mail.gnu.org/mailman/listinfo/emacs-bidi

Limitations

Here are limitations and bugs currently known.

Future Plans

The Emacs core developers concluded that the bi-directional drawing method used here is not suitable to be integrated into the original GNU Emacs. Another attempt to implement a bidi algorithm in Emacs' display engine is ongoing and it is likely to be a part of future GNU Emacs. An overview is described in the DisplayEngineForBidi section of the Emacs Wiki web site.

In the meantime, we keep adding new features to the current Emacs-bidi. Once GNU Emacs is equipped with bidi support, we will port functionalities and interfaces developed on Emacs-bidi.

Changelog

2008-11-21
2008-11-12
2003-08-22
2003-03-27
2002-12-12
2002-12-05
2002-07-25
2002-07-17
2002-07-01

TAKAHASHI Naoto
E-mail : ntakahas at m17n dot org
Last modified : 21 July 2009