[ This was to have been the third paper given in TESLA's Program, Serving Diverse Populations Through Electronic Standards, at the 1996 LITA/LAMA National Conference in Pittsburgh. Joan Aliprand, the author was unable to attend, so we have provided the text here.]
Joan Aliprand, Senior Analyst, The Research Libraries Group, Inc.
Firstly, let me start out with a disclaimer. What I say in this presentation should be regarded as personal opinion, and not the policy of any corporate body.
As the third presenter, I had hoped to recruit a librarian from a public library which had wrestled with the barriers to use of non-Roman scripts and overcome them. So I searched in the Bay Area, and then in Southern California. Both regions are heavily multilingual.
News published in the library press and shared by other librarians provided leads. But none of the leads panned out. Nobody I contacted felt that their institution was beyond anything but the most preliminary stages. In some cases, priorities had changed; for example, one library was now focusing on renovation of its building.
Now it would have been possible to seek a speaker from a university, since there are systems with non-Roman capability in some libraries. But these systems are intended to meet the needs of a particular scholarly discipline, and what needs to be discussed here is a system that is truly multilingual and multiscript.
So instead of practical advice from someone who has been through the process, this paper is somewhat theoretical. A number of members of TESLA contributed to the development of the content, and I thank them for their ideas. Any criticism should be directed solely at me, however. I hope this paper will provide you with food for thought. There are many questions which need to be answered.
The most obvious unanswered question is technology, which covers both hardware and software. Software is so interrelated with hardware that it sometimes seems like a chicken and egg situation.
If you seem to be in a quandary, perhaps too much attention is being given to "how" something is to be achieved, that is, picturing the system in detail. Pull back and try to define "what" is wanted, without any thought being given to "how" it is to be implemented. Once the objectives have been clarified &endash; including what is essential and what is desirable &endash; the choices with respect to "how" may be fewer. The objectives are the functional requirements for your system. The system that must meet the functional requirements could be an entirely new system or your existing system with modifications.
For a multilingual system, one of the functional requirements will be the languages which it must support. One decision that needs to be made with respect to language support is: Must all supported languages be used for "user dialog" interactions, or will the directions for system use be only in English? That is, will multilingual and multiscript support be limited to bibliographic records?
Normal presentation of each language should be a requirement. Attention needs to be paid to the character repertoire needed for a language, and also to the presentation of those characters.
For a language with an alphabet, can all the letters of the alphabet (including accented letters) be presented correctly? Although the extended Latin character set used in USMARC can be used to create bibliographic records in most Latin script languages, it is not acceptable to present text to users with the diacritical marks preceding letters. This is simply too unnatural.
More generally, are all the characters in a particular USMARC character set supported? If some are not supported, how would incoming occurrences of those characters be handled? Would they be stored (even though they cannot be seen), or would they be discarded? If they are stored, what would be the effect on searching?
Make sure that the team that evaluates a system includes people who know the languages to be supported. A person who knows a particular language will spot gross errors which may not be apparent to a non-reader. For example, in Hebrew or Arabic text which is written from right-to-left, incorrect display includes text written totally left-to-right, or with words in the wrong order on a line, or mullet-line text reading from bottom to top. The language expert can verify the character support, and evaluate retrieval and data presentation. For some languages, typographical aesthetics influence user acceptability, so the fonts may need to be evaluated.
These points apply whether or not you are acquiring a new system or seeking to modify an existing one. When there is an installed base of hardware, other questions come up. Is it possible to use this hardware to run software which provides non-Roman script support? Will it need to be upgraded? Will certain components need to be replaced? For example, a conventional "dumb" terminal may be limited to ASCII - the English alphabet, the numbers zero to nine, English punctuation marks, and a few miscellaneous symbols. What would it cost to replace these terminals with PCs? Would all of them need to be replaced, or does the system you plan to install considering allow end user devices to have different capabilities? What about printers? Do you have any ASCII-only printers? What would it cost to replace them with laser or ink jet printers?
Were you planning to upgrade your hardware anyhow, for other reasons, for example, to add more PCs to provide Web access? Is it possible to kill two birds with one stone by choosing the right system and equipment?
Technology is a key issue, but it is not the only issue. Another large issue concerns people: decision makers and staff in your library, and the population the library serves.
You know best how to make the case to the decision makers at your library. Can you persuade them that romanization is not good enough? Romanization does not serve the speakers of a language written in another script very well; its primary use to allow records for non-Roman titles to be processed will all other records.
Can you argue that non-Roman scripts deserve "equal billing" in the main catalog, rather than in a separate, script- or language-based card file?
When your library includes material in other scripts, you will need to bring them under bibliographic control. This means you will need someone who knows each language available on site, either a staff member or a community volunteer, or you will have to outsource the cataloging. Some university libraries seek out students who know a language and train them to create basic records or have them work with a cataloger.
This need for bibliographic control exists whether your library decides to implement other scripts on its catalog or not. Creating romanized records requires ability to read the language in its original script, and sometimes a more advanced knowledge is needed than if the original script was simply being transcribed.
"If you build it, they will come." Well, maybe not. A free library open to all is very much a Western concept. But outreach to the community that uses a particular language will surely not be limited to post-installation publicity. Representatives of the community can be used as an expert resource at all stages.
A current social trend is antipathy toward immigrants, including "let them speak English." Providing original script access to foreign language materials is bucking this trend. Academic libraries are not likely to be questioned when they propose adding original script access, but public libraries may be. Strong support from users of the languages will help counteract Anglo-centric critics.
There is one more issue regarding scripts which must remain open for now. This is the question of scripts for which USMARC character sets have not been specified; for example, Devanagari (the script of the Hindi language of India), Thai, Lao, and Greek.
It is possible to use other character sets within a system; for example, some library systems provide overstruck diacritics by converting USMARC character sequences to Windows Latin-1 characters for display. But if a system includes non-USMARC characters in exported records, other systems will be unable to handle such records properly.
Adding new scripts requires coordination among the community of USMARC users and the approval of the Library of Congress.
A MARBI Subcommittee is working on use of the Unicode Standard in USMARC. Unicode is the international 16-bit character set that includes the major scripts of the world, so the need for other scripts in USMARC records should ultimately be resolved through this effort. The Subcommittee posts periodic reports on the USMARC list serv.
To sum up: Supporting new languages and especially new scripts means that management and key staff members in your library have to acquire new knowledge. Get some basic information on the language and its script from reference sources, and then learn more from people who speak the language. Bear in mind that you may get conflicting information, just as you would if you asked a cross-section of people about English. Some features of one's mother tongue are so intuitive that we never think about them and find it difficult to explain them.
Define "what" is wanted rather than "how" something should work.
Evaluate eligible systems using "mean and rotten" examples. For example, not just simple records in one language, but multilingual records with parallel titles. Include a record containing all characters supported by the system. Test a full suite of cases: not just the positive ones, but the negative ones as well. For example, if a system supports only part of a USMARC character set, you need to test the negative case by importing a record containing the unsupported characters to see what does the system does. Make sure that there is someone on the evaluation team with language expertise.
Work with the local constituency that uses the language. They will be your best advocates, and may do supplementary fund-raising. Draw on these language experts for assistance in system evaluation, material selection, and cataloging. Most people are willing to share information about their language and culture.
In the last few years, technology and its associated electronic standards have taken on an international aspect. The large computer companies have realized that there are markets outside the English-speaking world: "globalization," "internationalization, and "localization" are the significant buzz-words. The Web is also breaking out of its Latin-1 confines, as noted in the Wall Street Journal of September 26.
But technological improvements are just the starting point. Given that technology does meet your needs, people resources become the key to overcoming the barriers to the use of non-Roman scripts: getting support from those in your user population who use a language, educating library staff about languages and scripts and the effect of the changes on their work, locating language specialists for selection and cataloging, and persuading decision makers to approve and fund this enhancement to public service.
© Copyright 1997 by The Research Libraries Group, Inc.
Joan Aliprand is a Senior Analyst at The Research Libraries Group. She was the analyst for the addition of Cyrillic, Hebraic and Arabic scripts to RLIN, and has also worked on modifications to CJK and the non-Roman additions to the RLIN Terminal for Windows software.
She has been involved with the development of the Unicode Standard for over 7 years, and is one of the authors of the recently published The Unicode Standard, Version 2.0. Her articles have been predominantly on the Unicode Standard and on the implementation of non-Roman scripts.
Maintained by
jennie@reed.edu
Last modified 26 June 1997