Standards

Home Day 1
Previous paper Day 2
Next paper Day 3
Author index

Standards, ever changing!

Geoff Payne

General Manager, Information Services
Association for the Blind,
Kooyong Vic.
Email: rmggp@ozemail.com.au

This paper presents an overview of the role of some key standards in supporting information management by libraries. Two types of standards are discussed: standards which allow libraries’ integrated library management systems to inter-operate, and standards for recording information as electronic documents. It is my intention to convey the strategic importance of the several standards discussed rather than explore their technical detail. The standards chosen for discussion in this paper are those associated with the current changes in the information environment of Australian libraries.

Libraries have a long tradition of organising and establishing control over collections of physical documents. In the late 1960s and early 1970s Machine Readable Catalogue (MARC) records replicated the data previously held only on catalogue cards. MARC records were useful for printing catalogue cards and were accrued by libraries in anticipation of the day when a suitable database could be constructed to allow searching and retrieval of information from a computer based alternative to the card catalogue.

MARC records came in a variety of national flavours, including AusMARC, the Australian MARC specification first published by the National Library in 1973. The AusMARC specification represented a compromise between the USMARC specification and the UKMARC specifications, economising on the then expensive disc space required to store records by suppressing the punctuation characters found in USMARC, and allowing instead for their reinsertion by program whenever records were displayed or printed. The AusMARC standard also allowed for expansion of the standard as deemed necessary by Australian libraries.

When the ABN system was launched in 1981, this local variant of the MARC specification continued as an effective tool for Australian libraries, even though the internal storage format for data in ABN was more aligned to the USMARC standard.

In the 1980s, integrated library management systems (ILMS) appeared which were designed to control the acquisition, cataloguing, circulation and inventory control functions of libraries and to replace libraries’ card catalogues. The heart of each of these systems was the bibliographic database comprising a large collection of MARC records corresponding to the physical items in the library collections. Early entrants into the Australian market were modified to load MARC records conforming to the AusMARC standard. For the many systems sourced from US vendors, this process introduced delays in the incorporation of new releases of software into libraries’ ILMSs, and the standard required maintenance resources at the National Library of Australia. With the proposed redevelopment of ABN the decision was taken to discontinue the use of AusMARC as the Australian standard for MARC records. ABN users voted to drop AusMARC at the Annual Users Meeting in June 1991, and later in 1991 the National Library announced that it would cease support of AusMARC ‘when any redevelopment of ABN is completed’. However, while ABN continued to operate, AusMARC products continued to be provided to allow libraries reliant on AusMARC time to change to systems supporting USMARC.

With the advent of the Internet, and through it, inexpensive access to sources of MARC records other than ABN, including MARC records with books services, the rationale for abandoning AusMARC in favour of USMARC has been further strengthened.

With the implementation of Kinetica, the replacement for ABN, in early 1999, support for AusMARC will be discontinued.

Thus a standard adopted for sound reasons over 20 years ago, and widely used in Australian libraries, and on which much effort and resources was spent, has been overtaken by enhanced communications and the dominance of USMARC in the marketplace for library systems and bibliographic databases.

Importantly, data migration from AusMARC to USMARC has been possible, thus preserving the information content of AusMARC records.

Inter-operability standards

The interdependence of libraries for access to the information their clients require makes mandatory the adoption of standards for library record-keeping and operations. The use of automated systems to manage libraries, and the need for inter-operability of these systems to support resource discovery, lending and, increasingly, electronic document delivery, requires that character sets, data elements, content rules, data formats and interconnection protocols be observed. Technical Committee 46 of the International Organization for Standardization (ISO), and national committees such as the National Information Standards Organization (NISO) in the United States and Committee IT/19 of Standards Australia and Standards New Zealand, are responsible for the particular standards applicable to libraries and information services.

Z39.50/ISO 23950

Australia has had the good fortune to have a single database as its national union catalogue, the ABN database. Elsewhere, and in particular in the United States, this has not been the case.

The Z39.50 standard has its origins in the Linked Systems Project in the 1970s. This project attempted to find a mechanism allowing bibliographic enquiries between the OCLC, RLIN and Library of Congress databases, with users of each database to access records on the other two through the user interface of their ‘home’ system.

The Z39.50 standard, which has become International standard ISO 23950, has been developed to allow complex bibliographic enquiries to be exchanged between client and server computers, regardless of the database structure or brand of software running on the server computer. The Z39.50 standard has developed into a set of capabilities within a client– server architecture.

Client–server architecture refers to two computers inter-operating to exchange information. Typically these may be a personal computer running Z39.50 client software and a remote computer running Z39.50 server software. In the Z39.50 standard (and the identical international standard ISO 23950) the computer running the client software is referred to as the ‘origin’, and the computer running the server software is referred to as the ‘target’. Another example of client–server computing is a World Wide Web browser such as Internet Explorer or Netscape Navigator (the client) on a personal computer, interacting with World Wide Web servers located around the world and connected to the user’s workstation by the Internet.

The Z39.50 client software helps users to formulate enquiry transactions, then sends the enquiry off to the server computer. The Z39.50 server software then recognises and processes the enquiry, assembles the response data, and sends the response back to the client computer. The Z39.50 client software then displays it on the screen in some way that is easy for the user to manipulate, save to a file, print or do what ever else the user may want with it.

The Z39.50 standard has been extended to allow the client software to launch simultaneous enquiries on several server computers. This has promise as a mechanism to allow several servers to appear to a user as a virtual union catalogue; however there are problems in resolving duplicate records retrieved from the servers for the presentation of the results by the client software. Delays in communications links can mean that enquiries to some servers time out while others are successful, and in the absence of a standard set of query functionality for a bibliographic database, enquiries may be processed differently at different server computers, leading to unexpected or inconsistent results being returned to the client software.

Recognising these shortcomings, it is possible to set up a Z39.50 client to routinely search one or several target databases. A cataloguer may choose to search simultaneously on an overseas database of a library holding materials in a particular subject area as well as the Kinetica database, the local ILMS and a second bibliographic utility. The efficiencies over searching these individually are obvious.

The Z39.50 standard Version 3 allows a client to initiate simultaneous update transactions on one or more servers. This functionality is defined in the Z39.50 union catalogue profile. This would allow a library which has a Z39.50 version 3 compliant ILMS and client software to perform cataloguing operations using its client software, then send simultaneous database update transactions to both the local ILMS server and, were it developed to support this protocol, the Kinetica server. This offers efficiencies of workflows, as updating of the National Bibliographic Database can be a by product of cataloguing transactions on a library’s ILMS without any additional work being required by the cataloguer.

The details of the implementation of the Z39.50 union catalogue profile and other developments can be obtained from the minutes of the Z39.50 Implementers Group (ZIG) as recorded at the Z39.50 maintenance agency, the Library of Congress.

Stowe Computing have developed a server which can be used to test implementations of the Z39.50 union catalogue profile.

Z39.50 HTTP gateway

A Z39.50 HTTP gateway allows a user running a Web browser to access the functionality of Z39.50 client software, and thus perform simultaneous searching of Z39.50 server databases through their Web browser. Such a gateway obviates distribution of Z39.50 client software to all potential users of Z39.50 compliant databases. Many vendors of ILMS also offer Z39.50/HTTP gateways. The Kinetica system will feature such a gateway through its LibriVision module. An online demonstration of LibriVision is available at http://www.elias.be/~demo/LV10/LV/LibriVision.html — follow the ‘Free access’ button, then close the help file to get to the main menu.

Profiles

Standards such as Z39.50 and the Inter-Library Loans Protocol offer various options for implementation of particular functionality. Hence implementors may be protocol compliant but still unable to communicate if they adopt different options. Sets of options developed by implementors which guarantee inter-operability are known as profiles.

Inter-library lending: ISO10160, 10161

The Inter-Library Loans Protocol provides a means of exchanging structured messages between library systems for the conduct of inter-library loans transactions. The protocol specifies a client server architecture, and standards for each type of message necessary to allow client software and server software from different software suppliers to inter-operate to achieve interlibrary lending activities.

The ILL protocol allows various options for implementation, such as the set of messages to be supported and the message transport mechanism. In late 1996, the ILL Protocol Implementers Group (IPIG) began work on the IPIG profile, which will differ from the Canadian profile. The sixth draft of the IPIG Profile (as at 1 September 1998) is available on the Web.

The ILL protocol will be a basic building block of the Kinetica system. As the AMICUS software does not include support for ILL management, the National Library has selected Fretwell-Downing’s OLIB VDX ILL software to support the National ILL Utility. This is the same product chosen by the Australian Vice-Chancellors’ Committee’s LIDDA (Local Interlending and Document Delivery Administration) Project. The OLIB VDX software conforms to the ILL Protocol and will interface with the union catalogue in the Kinetica system.

Holdings data standards

With increasing automation of inter-library loans, the lack of standardisation of holdings statements in the national union catalogue has become a problem. A complex holdings statement may have been truncated to meet ABN limits on the number of characters in any one record. Other holdings statements are expressed in ways that are clear to a person reading them, but not amenable to machine interpretation, and thus not suitable for automatic assignment of possible supplier libraries to any one ILL request. This perpetuates the ILL equivalent of the manual telephone exchange, where an operator needs to manually match ILL requests with libraries holding the item sought.

The USMARC format for holdings and locations is an obvious candidate for recording such information, but enjoys little popularity. I am unaware of any other standard which attempts to address this problem.

Directory services for library patrons

Work on the Inter-Library Loans Protocol to facilitate inter-library lending of materials addresses the mechanics of requesting, receiving and returning materials borrowed from other libraries. Still to be resolved is the way in which patron-identifying information is shared between libraries. Jan Gatenby has suggested that Z39.50 may be extended to include circulation enquiries such as patron identification and status from the patron’s ‘home’ to support reciprocal borrowing and inter-library lending activities.

With the need arising in other contexts for unique identification of individuals for billings for electronic commerce and payments for rights in electronic documents, libraries need to be aware that more generic initiatives for identification of individuals are likely to emerge.

Data element dictionaries

The Internet has provided the communications channel to allow libraries to share resources more effectively by facilitating resource discovery and allowing library users to do much of the work in initiating document delivery requests. In the case of circulation and related activities such as inter-library loans, inter-operability is dependent on systems being able to use a standard method of identifying how particular pieces of information should be identified. For example, if an enquiry transaction is to be launched from a library at which a user wishes to borrow materials under a reciprocal borrowing arrangement, how is their standing at their home library to be established? Obviously a standard means of requesting such information is required, and to this end a data element dictionary for circulation information has been devised as ISO 8459-4. This codifies such transactions as renewal request, user history request, suspend user rights request, etc. The other parts of ISO 8459 define data elements for interlibrary loans, acquisitions, information retrieval and circulation applications.

The transition to electronic documents

The standards referred to above all deal with the traditional library activities of controlling physical collections of materials. While it has been the general view that the demise of the book is not imminent, there has been a huge move towards the use of electronic documents in recent years, fuelled by the ease of access to documents mounted on World Wide Web servers.

Even the book is under reconsideration as a device for conveying technical and scholarly information. The recent ‘Electronic Book ‘98 Workshop: Turning A New Page in Knowledge Management’ discussed a variety of technical solutions and products which are now on the market addressing the problems of matching the readability of printed pages with the convenience of searching and navigating within electronic documents.

These developments, and the increasing responsibilities of libraries in developing, managing and preserving collections of electronic documents, raise the matter of standards for electronic documents.

The development of standards relating to the World Wide Web is led by the World Wide Web Consortium (W3C).

Electronic document standards

The need for document standards is exemplified by the proliferation of file formats associated with word-processing packages, and the resulting difficulties which arise in exchanging documents as electronic mail attachments, much less publishing information on the Web. HTML is certainly a better document structure than a proprietary word-processing file format so far as universal accessibility of the data produced is concerned, but HTML has its limitations. XML is a young standard designed to overcome the limitations of HTML, and is still being actively developed.

Hypertext Mark-up Language (HTML) is the document format which has allowed many Web pages to be created by users with little prerequisite training. HTML is a particular document type within the Standardised Generalised Markup Language (SGML).

‘Markup’ is a term for the annotating of text with codes to define how it is to be laid out and presented, for example markup codes may indicate ‘new paragraph’ or ‘bold face’. A markup language must specify what markup is allowed, what markup is required, how markup is to be distinguished from text, and what the markup means. SGML provides the means for doing the first three.

SGML provides a descriptive markup language rather than a procedural language. The descriptive markup simply identifies types of information such as headings, paragraphs, etc., but does not specify how these are to be presented.

SGML was originally designed to facilitate the exchange of text information between computer systems regardless of their particular internal storage characteristics. This is a similar role for text files to that played by MARC records for the exchange of catalogue records between library systems.

Within SGML, ‘document type definitions’ (DTDs) describe particular classes of documents. A DTD defines the expected structure of a set class of documents, setting out what elements of the document must be present and how each element is identified within the documents. Knowing a document complies with a particular DTD means a piece of software designed to process documents of that type can be expected to perform predictably.

HyperText Markup Language (HTML) is one document type within SGML. HTML allows the markup of documents as simple reports including text attributes such as headings and paragraphs, inclusion of images and some multimedia, and provision for linking documents through hypertext links.

HTML has shortcomings in that it provides some formatting information, but not sufficient to deal with complex information and layout. This leads to fudging the presentation of information using images of text, which renders the text content of such images unsearchable unless alternative text is consistently supplied. Layout fudges such as use of tables to display information in columns and null images between characters to achieve a particular look favoured by a designer are not uncommon. This has also led to proprietary extensions of HTML being supported by various Web browsers, with the resultant loss of uniformity of access to information. The presence on Web pages of text ‘Best viewed with X’ attests that features of browser X have been used in designing the pages and that this information will not be accessible if they are viewed using another browser.

The separation of content and information about its presentation, which is not well enforced in HTML, is one of the objectives of XML, the Extensible Markup Language. XML is being developed embodying the simplicity of HTML with the flexibility of SGML, but without the complexity involved in SGML.

Unlike HTML, XML does not comprise a predefined tag set, but provides a facility to define tags and the relationships between them. SGML is not well suited to serving documents over the Web; however defining XML as an application profile of SGML means that any fully conforming SGML system will be able to understand XML documents. However, a system capable of understanding the full generality of SGML is not required to understand XML documents.

Cascading style sheets

By enforcing the separation of information about the structure of content and information about its presentation, the same content can be displayed differently in different contexts.

For a person with a disability such as low vision, the capability to change the contrast and size of fonts presented on a screen can mean the difference between being able to access the information or not. For a blind person, or someone accessing information in an eyes-busy context, the ability to screen out the images and present the alternative text for rendition by synthetic speech software is important. Audio style sheets allow the ability to define how the information is presented using synthetic speech. This can make a difference to the comprehension by assigning different emphasis to what is heading information and what is detail. For a sighted person using a personal digital assistant or mobile telephone to access a Web site similar capabilities to define which information is presented and how it is presented are essential. One may also want to present documents on public kiosks with touch screens in different formats from the way they are presented on a conventional PC running a Web browser.

This capability of presenting the same information differently in different contexts is afforded by the concept of ‘cascading style sheets’. Style sheets are supported in HTML 4.0 and in XML. The author of a document can associate a style sheet with the document, which is then used to determine the presentation of the information in the absence of a user-defined style sheet. If a user chooses to define a style sheet, then the user can control the way the information is presented, for example setting colour rendition and contrast to avoid any colour-blindness or perceptual difficulties, enlarging fonts and the like. The user needs only define the elements of a style sheet necessary to ensure clear presentation. Elements of presentation not defined in the users’ style sheet are inherited from the author’s style sheet.

Unicode for text encoding

The rapid growth of the World Wide Web has accelerated the push for developing global software. An international language encoding standard has rapidly become a necessity so that computers in one world language community can ‘talk’ with those in another language community. Unicode supports and fosters the multilingual computing world community. Using Unicode, the names of individuals can be properly expressed and text in the world’s many scripts can be exchanged between information systems.

The Unicode Standard provides the foundation for internationalisation and localisation of software. The Unicode Standard is a subset of, and code-for-code identical to, the International Standard ISO/IEC 10646-1:1993.

The Unicode Standard, Version 2.0 contains 38 885 characters from the world’s scripts. In addition, the Unicode Standard includes mathematical operators and technical symbols, geometric shapes, and dingbats.

Both HTML 4.0 and XML 1.0 specify Unicode/ISO 10646 as their default character set, and the Internet Engineering Task Force Policy on Character Sets and Languages specifies that protocols must be able to use the UTF-8 character set from ISO 10646. In UTF-8, the characters have coded representations that comprise sequences of octets of length 1, 2, 3, 4, 5 and 6 octets.

Portable document format (PDF)

The ‘portable document format’ was developed by Adobe Systems, and essentially comprises a bit-mapped image of a document as it would appear in print. The PDF format is derived from PostScript files. The difficulty with PDF files is that they do not easily support the same flexibility in navigation within documents as do HTML files. Adobe have developed software which will convert a PDF file to simple HTML, but much of the structuring in the document evident in the page images presented in PDF is lost. This then makes it difficult to navigate from heading to heading, or undertake anything other than a linear reading of the document.

Resource description framework

With an electronic document comes the possibility of instant access to it through the Internet. However, that access must be structured to ensure search engines can establish the relevance of the document to a given search request, and that the author can be compensated for the intellectual property embodied in the document.

Metadata is the term used in the Internet context for data about other data. A library catalogue is a classic example of metadata about the library’s collection.

The ‘resource description framework’ (RDF) provides a structure which can be expressed in XML for metadata about any electronic document. RDF provides a framework for exchanging metadata between systems on the Web to ensure they can inter-operate. RDF also structures metadata to facilitate machine analysis of information about documents on the Web.

While the Dublin Core specifies descriptive metadata elements, it does not address the data elements necessary for rights management, that is reimbursing copyright owners each time a copy of a document is downloaded from the internet. In the Web environment it is sensible to combine the metadata dealing with content — why you may be interested in obtaining a copy — with the metadata which details the cost and conditions under which a copy can be obtained.

Before electronic rights charges can be collected automatically on the Internet and dispersed among the rights holders in any given work, more ‘digital object identifiers’ (DOIs) need to be developed. DOIs are the equivalent of ISBNs for books, ISRNs for recordings and the like extended to other document types. The complexities of recording the presence of embedded objects in another document which carries its own rights management properties also need to be addressed. For example, an image can be an object in its own right, appear in a book, be incorporated in a collage or be incorporated into a movie scene. Each use of the image must be accounted for in a rights management framework.

Ways of uniquely identifying persons and corporations having rights in digital objects also need to be standardised. Once these issues are addressed, the prospect of automatically collecting and disseminating small amounts of money each time a given document is accessed or copied becomes possible.

Accessibility standards

Apart from the technical standards which are being developed to deal with the issues identified above, there are now legislative requirements in countries such as Australia requiring that World Wide Web sites are structured in ways that make them accessible to people with disabilities. The Human Rights and Equal Opportunity Commission Web site in Australia lists some of these requirements and the techniques and services such as Bobby which can assist Web designers ensure their sites meet accessibility standards.

Resources on standards

The National Library of maintains a section of its Web site focused on Library and data standards at http://www.nla. gov.au/niac/standards.html#oz, which provides a starting point for tracking standards developments world-wide.

Standards Australia and Standards New Zealand’s joint Committee IT/19, chaired by Neil McLean, tracks standards developments in the ISO arena and contributes Australian expertise in this area.

The World Wide Web Consortium (W3C) is the mechanism for the coordination of standards development in the Web environment.

Conclusion

I have endeavoured to outline the importance of some key standards in the environment in which libraries and information services are now operating. As with AusMARC, some of the standards which are in common use today will in all likelihood disappear in the years to come, replaced by standards which address the issues as yet unresolved in the transition from paper-based information to networked information in digital formats.

There have been doubts expressed that Z39.50 will survive in the Web environment, as it is complex to implement. On the other hand, it is a mature standard addressing the complex needs of searching for information, and as other less rich alternatives are found wanting it may become more widely accepted. How the rights management issue will be resolved is as yet unclear, but at least the problem is now being addressed.

While inter-operability standards and metadata standards are still evolving, and a multiplicity of document standards abound, libraries and others planning the creation of digital documents or the conversion of existing documents to digital formats, must weigh carefully the strengths and weakness of available choices. Seeking advice from those active in the monitoring of the development of these standards should inform the decision process.

References

  1. Lynch, Clifford A., ‘The Z39.50 Information Retrieval Standard: Part I: A strategic view of its past, present and future’, D-Lib, April 1997.
    http://www.dlib.org/dlib/april97/
  2. ‘ZIG meeting — day one, Wednesday, January 21, 1998 Orlando, Florida.’
    http://lcweb.loc.gov/z3950/agency/orlando/output/day1.html
  3. http://www.stowe.com.au/stowe/library4.htm
  4. LibriVision Introduction.
    http://www.elias.be/~demo/LV10/LV/techno.html
  5. Shuh, Barbara, The Interlibrary Loan (ILL) Protocol: An Introduction, Information Technology Services, National Library of Canada, 1996 (revised 1998) (Network Notes #40)
    http://www.nlc-bnc.ca/pubs/netnotes/notes40.htm
  6. http://www.nlc-bnc.ca/iso/ill/document/ipigwp/ipd69809.pdf
  7. For more information see ‘Kinetica Implementation Project — Consolidated Q&A’s.’
    http://www.nla.gov.au/nsp/amicqar.html#ill
  8. Email discussion tabled at Standards Australia/Standards NZ IT/19 Committee meeting 21 November 1997.
  9. ‘Electronic Book ‘98 Workshop: Turning A New Page in Knowledge Management, National Institute of Standards and Technology, Gaithersburg, MD, October 8–9, 1998’.
    http://www.nist.gov/itl/div895/isis/ebook98.html
  10. ‘A Gentle Introduction to SGML’.
    http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/SG.htm
  11. ‘What is HTML?’
    http://www.w3.org/MarkUp/
  12. ‘What is XML?’
    http://www.xml.com/xml/pub/98/10/guide1.html
  13. http://www.unicode.org
  14. http://info.internet.isi.edu/in-notes/rfc/files/rfc2277.txt
  15. Rust, Godfrey, ’Metadata: the right approach; an integrated approach to descriptive and rights metadata in E-commerce’, D-Lib magazine, July/August 1998.
    http://www.dlib.org/dlib/july98/rust/07rust.html

Return to top