Conference papers


[ ALIA home | conference home | papers | photographs | search... ]
online20001 conference logo

Working Online II

Unlocking the Archives: A New National Model for Resource Discovery in Archival and Manuscript Collections

Dr Toby Burrows

Scholars' Centre, University of Western Australia Library

Dr Liz Hardy

Scholars' Centre, University of Western Australia Library

Margy Burn

National Library of Australia

Information technology and content are powerful words in the computing, education, and commercial worlds today. Distance learning, remote access to preservation materials, sharing of restricted data are a just a small representation of the online information services spectrum. This paper describes, in detail, the conceptual development of a digital asset management and delivery system solution developed as a collaborative effort by Texas A&M University and Silicon Graphics Inc. (SGI). Innovative software design approaches addressing technical and policy engagement challenges that arise arise out of the unique and common aspects of applications are presented.

Introduction

Archival and manuscript collections are a major component of Australia's research infrastructure. They are a rich and unique source of cultural, social and historical material. But detailed information about their contents has been scattered and difficult to access effectively. To date, descriptions of such collections in Australia have taken three main forms:

  • entries in the catalogues of individual libraries and in the National Bibliographic Database (NBD);
  • entries in the Register of Australian Archives and Manuscripts (RAAM); and,
  • detailed individual guides produced by the specific repositories.

In the first two cases, only a collection-level record is provided. There is no facility for a detailed description of the intellectual structure of the collection, its physical arrangement, or the exact nature of the papers which which it contains. While most library catalogues, together with the NBD and RAAM, can be consulted over the web, researchers cannot obtain a detailed understanding of a collection's contents from these sources.

The individual guides produced by specific repositories are, for the most part, unpublished documents in typescript which are only available at the repository itself. Their scope and quality tend to vary wildly. Researchers often face a difficult choice between making an educated guess about how relevant a collection is likely to be to their research and undertaking the potentially laborious process of acquiring a copy of the guide. Some guides are beginning to appear on the web, and some are also linked to the collection-level records on RAAM. But even these still vary greatly in scope and quality. Unlike the collection-level records, they do not exist as a coherent, searchable collection of similar data from a range of different repositories.

Finding manuscript material on specific Australian literary authors, for instance, is a slow and difficult process. RAAM is the best starting-point, but it is not comprehensive and does not normally reveal when an author's material is contained within the collection of another author's manuscripts (e.g. letters to and from Robert Adamson in a collection of John Kinsella's manuscripts). Tracking down such material is a matter of painstaking detective work - both in the scattered guides of possibly relevant collections and in the collections themselves.

This situation is less than satisfactory for researchers. It is also unsatisfactory for the repositories, given that some of their most unique and valuable materials are languishing, considerably under-exploited by the people for whom they were collected. A new approach is needed which provides standardized guides, makes them available over the web, and enables them to be browsed and searched as a corpus of similar data.

The Australian literary manuscripts finding aids project

A national project has been developing and testing just such a new model for resource discovery in archival and manuscript collections in Australia. Funded by the Australian Research Council through its RIEF Scheme, the project is a collaborative initiative which brings together six major research academic libraries, under the direction of the University of Western Australia Library. The main aim of the project is to build a national database of electronic guides, or finding aids, for collections of Australian literary manuscripts. At its completion, finding aids for more than eighty collections will be published over the web. The project is also intended to serve as a demonstrator site for future work in the application and development of data standards for archival finding aids in Australia, and as the first step towards a future national resource discovery service for research materials in manuscript repositories and archives.

The project has adopted a standard format for its finding aids: the Encoded Archival Description (EAD). Existing finding aids - usually in Word, HTML or typescript - are converted to this format in a variety of ways. Templates have been developed which can be used with SGML/XML authoring software like XMetaL, text editors like NoteTab, or MS Word. For the most part, this conversion has been done in the libraries themselves, but some work has been outsourced to a commercial firm which specializes in SGML encoding. In several cases, the process of encoding has been preceded by a complete revision of the content of the finding aid.

After encoding their finding aids, the participating libraries send the files to the University of Western Australia (UWA), where they are initially published on a restricted-access test web site. The DynaText and DynaWeb suite of software, which indexes and formats SGML files so they can be displayed and searched over the web, is used for the publication process. After this initial publication, each finding aid is checked, and details of any required revisions are sent to the contributing library, where staff make the suggested changes and resubmit the corrected files to UWA. Once this quality assurance process is complete, the finding aid is published on the public web site.

The public web site, which is known as the Guide to Australian Literary Manuscript Collections, contains the full set of approximately eighty finding aids produced for the project. The collection can be browsed alphabetically by author's name, and each guide can be browsed through its table of contents. The complete collection, as well as individual guides, can also be searched by keyword. This makes it possible to look for all occurrences of a particular person anywhere in any of the guides. Several of the guides relate to collections of manuscripts from the same author, held in different repositories. For Dame Mary Gilmore, for instance, the database contains finding aids from four different libraries: the National Library of Australia, the Australian Defence Force Academy, the University of Queensland, and the Unviersity of Sydney. It is possible, as a result, to reunite - in a virtual sense - the physically dispersed material from a particular author.

Encoded archival description (EAD)

Encoded archival description, or EAD, was developed by the Berkeley Finding Aid Project at the University of California in 1993, under the direction of Daniel Pitti, and is now widely applied in North America and Europe. Adopters include the library of Congress and the Public Record Office in Great Britain. EAD provides a standard data structure for displaying, navigating and searching archival records over the web. It is SGML-based and XML-compatible, and incorporates related standards for archival metadata such as ISAD(G) and the Canadian Rules for Archival Description.

One of its key features is its hierarchical structure, reflecting the intellectual organization of the collection being described. Beginning with a general overview, the EAD finding aid works down through successive levels, from series to sub-series and on down to individual items and pieces. The description of the more specific levels inherits the context of the more general levels. EAD is also hospitable to embedded links to external files and entities, such as related guides and records as well as images and text files.

Another feature is its flexibility. The EAD Tag Library contains 118 possible elements, each with a wide range of attributes. In this environment, there is a need to supplement the Tag Library with a set of interpretative guidelines. The project has developed a set of Australian guidelines for the retrospective conversion of finding aids to the EAD format. These guidelines are based on those developed by Daniel Pitti with consortia in the United States and are intended to ensure consistency in the structure and appearance of finding aids contributed to the project database.

The EAD format also makes provision for the use of controlled access terms in areas such as personal names, place names, genres, and types of material. These can be provided either in a separate section of the finding aid or as embedded encoding within the descriptive sections of the finding aid. In both cases, the EAD format does not prescribe an authorized source for such terms. Nor does it provide a mechanism for maintaing the format of the terms themselves.

The EAD format contains some information about provenance and administrative arrangements (such as access conditions and copyright) but it is not intended to be a collection management tool for archives and manuscript repositories. The need for linkages between collection description and collection management is being addressed by the Australian Science and Technology Heritage Centre (Austehc), which is currently working to enable the production of EAD-encoded finding aids from its collection management database application, Online Heritage Resource Manager (OHRM).

Towards a national model for resource discovery in archival and manuscript collections

The RIEF project represents the first systematic application of the EAD format in Australia and is intended to provide a base for developing the future information infrastructure in this area. While the national model for the use of EAD in Australia is still in its early stages, there are a number of complex issues relating to its development which need to be discussed and resolved. Some of these issues are structural, while others relate to disseminating more widely a knowledge of EAD and expertise in its use.

The latter area relates mainly to the provision of suitable training in the use of EAD. The project sponsored a visit to Australia by Daniel Pitti in March 2000, which included public seminars in four capital cities as well as in-depth training for project participants. The expertise acquired during the project will now need to be disseminated to the wider library and archive communities, and a national training strategy is being developed to address this need. One issue of particular importance is whether EAD training should be coordinated by a particular industry group (such as the Australian Society of Archivists) or a specific institution (such as the National Library of Australia). The possibility of including EAD in the courses offered by tertiary institutions - particularly those aimed at the information professions - should also be actively pursued.

The major structural issue is to determine what kind of national model should be followed for the publication of finding aids which use the EAD format. There has been considerable debate in the United States about the pros and cons of distributed and centralised models. A centralised approach has the advantage of removing the need for individual repositories to find ways of converting their EAD files to HTML for viewing on the web. It also ensures a consistency of presentation as well as a uniform level of access and searching functionality. It provides a single starting-point for researchers interested in this kind of data, giving them the ability to do a single search across finding aids from a variety of institutions. But it also raises numerous questions which relate to mechanisms for contributing files to the central site, retention of intellectual property rights, and governance of the whole process. An additional difficulty might be to identify a single institution as the central repository covering both the library and archival sectors. It could also be argued that this model does not take full advantage of the distributed architecture of the web.

The UK Archives Hub Service is an interesting example of this approach. It provides a centralised publishing service for finding aids contributed by higher education institutions. Different levels of participation are offered:

  • 'basic', which involves filling in a simple web template from which an EAD file is generated at the Hub;
  • 'flexible', which adds the ability to use basic EAD formatting elements within the template;
  • 'full SGML', which provides a blank SGML file for use with SGML editing software.

About 5000 finding aids from fifteen institutions were made available in the first phase of this project, which has subsequently received a further three years' funding from the Joint Information Systems Committee (JISC).

The Archival Resources service of the Research Libraries Group (RLG) is another example of this approach. It differs in that it uses web spider software to gather finding aids periodically from local institutions, and provides a common search interface and display format. Over 16,000 finding aids are now included. One assessment of this service identified limitations and potential future difficulties: 'While the RLG system works very well, we felt the model had inherent limitations for accommodating local practice in a scalable way. In order to accommodate institutional variation, the system is forced to handle each new case for both indexing and display programs. Beyond a few dozen institutions, this could become quite difficult to accomplish and maintain over time.'

Using a decentralised model, in contrast, would mean that individual repositories retain and publish their own finding aids. This approach avoids difficulties with intellectual property and uses the distributed architecture of the web. It offers, however, much less likelihood of consistency or of a uniform level of access and presentation. Smaller repositories would probably be discouraged from publishing their finding aids in EAD format, and would find it difficult to acquire sufficient expertise and to provide the infrastructure for publishing SGML-based files. XML may help to reduce these problems to some extent. There would undoubtedly be problems for researchers in locating relevant finding aids in a fully decentralised structure, as well as an inability to search across files held in a variety of locations.

Perhaps the most effective approach would be one which combines centralization and decentralization. Within a distributed network, it would nevertheless be possible to have a central site which acted as a directory service. This facility could be constructed by using automatic methods to harvest summary data from sites on the network, in order to create a central database which pointed to finding aids held on remote servers. It could also include a single search interface which worked across this distributed network. A decentralised structure of this kind could also contain various nodes to which smaller repositories without the resources to publish their own finding aids would be able to contribute raw files for encoding and publication.

The Distributed Finding Aid Server (DFAS) project, carried out at the University of Michigan and Harvard University in 1999, implemented and tested an online distributed search system for diversely encoded finding aids. It achieved some 'useful results', but also identified a lack of standardization in the way EAD is applied as a possible source of difficulties.

One of the steps in developing a national architecture for finding aids will be to identify the future role and place of the database created by the RIEF project. The UWA Library intends to host and maintain this database for the time being, at least until the future national approach becomes more settled. The project database is an example of a centralised model, which focuses on a specific subject area. It remains to be see whether expanding this database into other subject areas is an appropriate and sustainable approach. In the longer term, the project database may become one node in a partly decentralised structure, or it may be migrated to a central hub or dispersed to its contributing institutions. Another possible option is a linkage with one of the North American or British finding aid services. In the meantime, the project database will also need to interface with related elements of Australia's information infrastructure. These include RAAM, the NBD and the Australian Literature Electronic Gateway (ALEG).

A future national model for the publication of finding aids will also need to address the management of EAD as a metadata standard. Australia is already represented - by Gavan McCarthy of Austehc and the Australian Society of Archivists - on the U.S. working group which considers amendments to the EAD format. The establishment of an Australian body which can feed Australian advice into this process is highly desirable. Such a group would need to combine representation from the library and archives sectors, as well as from relevant professional bodies.

Conclusion

The round table on 'Archives in the National Research Infrastructure', held by the National Scholarly Communications Forum in November 1999, endorsed a 'vision for the creation of a web-based distributed search/access infrastructure for archives, based on common descriptive and technical standards'. The RIEF project has explored the possibility of using the EAD format as the basis for this kind of infrastructure, and has demonstrated the value and suitability of this approach. The project has also identified a range of issues which need to be addressed in the development of a national model of this kind.

Above all, the project has made a significant contribution to the development of new ways of unlocking the contents of archival and manuscript collections in institutions which act as custodians for Australia's cultural heritage. Finding aid databases such as the one created by the RIEF project can do much to promote and encourage the use of these collections, and to make their contents better known to researchers and the wider community alike. In the spirit of the original Berkeley Finding Aid Project, the RIEF project envisages an information future in which serious scholars and the casually curious alike can easily isolate the cultural treasures they seek. In this information future, information seekers follow clearly marked paths through library catalogs to finding aids and from finding aids to treasures in a multitude of computer and traditional formats... and back.


indextop



http://conferences.alia.org.au/online2001/papers/working.online.iic.html
© ALIA [ feedback | update | privacy ] . 3:05pm 18 February 2004