online logo

The one-stop shop: a single end-user interface for search and discovery across digital library collections

Thomas Girke Manager, Collection Resources Support, CSIRO IT Services, Private Bag 89, Clayton South VIC 3169 Australia, ph 03 9518 5940, mb 0418 821977, fx 03 9518 5959 thomas.girke@csiro.au

Biographical sketch

Tom is Manager, Collection Resources Support within CSIRO's Information Technology Services with responsibility for the organisations Integrated Library System, the Library Network's Union Catalogue as well as oversight of centrally co-ordinated purchasing of serial resources. With a keen interest in digital library solutions Tom has been involved with the implementation of ENCompass since he took up his post in April 2002.

Co-authors

Jacqui Porter Manager, Electronic Acquisitions, CSIRO IT Services, ph 03 9518 5936, mb 0410 623 601, fx 03 9518 5959 jacqui.porter@csiro.au
Rolfe Westwood Manager, Database Integration, CSIRO IT Services, ph 03 9518 5940, mb 0410 623605, fx 03 9518 5907 rolfe.westwood@csiro.au

Abstract

Australia's premier science organisation, CSIRO, is implementing an integrated tool to allow researchers to search across all types of information available to them in one step. Known as ENCompass, it provides a single gateway for access to information regardless of where it resides - the CSIRO OPAC, other OPAC's, abstract and indexing databases, full-text journal collections, and digital repositories. This paper discusses the implementation and customisation of the system, its functionality as a product as well as the protocols that now allow users federated searching across resources.

Introduction

CSIRO (Commonwealth Scientific and Industrial Research Organisation) is Australia's national science agency, with a history of conducting research of value to Australia and Australians for more than 75 years. Our 20 research divisions and 6000 staff, located in laboratories and field stations Australia-wide and overseas, research problems across a wide range of disciplines, including agriculture, environment, health, communications, IT, manufacturing, construction, minerals and energy.

CSIRO is in the top 1 per cent of scientific institutions worldwide for 11 out of 22 research fields (as measured by Thomson ISI), publishes around 3000 scientific papers each year, is Australia's leading patent enterprise, with more than 3500 patents either granted or pending, and has produced more than 70 spin-off companies. Among CSIRO's more recent achievements are biodegradable packaging for the food industry, discovery of a new type of pulsar (by the CSIRO Parkes radiotelescope), research which led to the anti-influenza drug (Relenza), the EXELGRAM anticounterfeiting device (adopted by American Express), and haptic (force-feedback) workbenches for virtual manipulation of 3D objects.

CSIRO Library Network

CSIRO's 20 discipline-based research divisions are served by a federated network of divisional and site libraries. The CSIRO Library Network extends beyond CSIRO itself to encompass also ANSTO (Australian Nuclear Science and Technology Organisation), Food Science Australia (a joint venture between CSIRO and the Victorian government's Australian Food Industry Science Centre) and some co-operative Research Centres which are colocated with CSIRO laboratories. Physical (and many specialised online resources) are funded and provided mainly on a divisional basis, with co-operative sharing of these resources across the Network, facilitated by a single union catalogue using Voyager software from Endeavor Information Systems.

Accessing CSIRO online resources - the problem

During the 1990s, the CSIRO Library Network also began collectively acquiring online resources to be made available to staff via the desktop across the entire organisation (and, where possible, its affiliates as well). These resources grew organically as funding and technological opportunities developed and now consist of a variety of products on differing platforms:

At the same time, CSIRO Divisional libraries continue to arrange access as needed for more specialised online material for particular sub-groupings of researchers (with attendant authentication requirements for subsets of CSIRO users).

As of 2002, therefore, CSIRO's online library environment looked something like what is presented in Figure 1.

girke1
Figure 1 - CSIRO's online library environment

Left and centre are current delivery systems for online content (and metadata describing content) for CSIRO users, tracing links which have also grown up organically over the last 3-4 years between the components. Floating disconnectedly on the right are delivery systems for other online content available to CSIRO staff, including the CSIRO intranet, organisational records, experimental data and locally created databases (e.g. image collections).

CSIRO Directory of Information Tools

As offerings became more diverse and available from more and more disparate locations and interfaces, we initially attempted to at least gather all available library products together in one accessible listing. This resulted in the creation of the CSIRO Directory of Information Tools (DIT), consistently one of the most highly used websites by CSIRO staff (see figure 2). The DIT, which takes advantage of software created in-house by CSIRO for mounting its public web pages, is an XML database of metadata describing electronic resources available to staff. The DIT has, however, become to some extent a victim of its own success, as it currently lists more than 100 databases and full-text content options in a single browseable list with only limited search options by subject.

girke2
Figure 2 - CSIRO Directory of Information Tools

Endeavor's ENCompass - a potential solution

CSIRO felt that Endeavor's ENCompass software, released in 2002, offered a potential solution for tying CSIRO's 'digital library' offerings together in a logical fashion (offering us a 'new and improved DIT' and much, much more in the way of functionality.

Why ENCompass?

Although the market for library 'one-stop-shop' software is still immature, ENCompass was not the only product CSIRO could have chosen. We selected ENCompass for three reasons:

  1. standards-based product;
  2. builds on existing CSIRO IT infrastructure (successor software to ScienceServer, on which CSIRO's Electronic Journals Collection currently resides, integration with CSIRO's library management system Voyager, based on a compatible Oracle back-end);
  3. also, as, ultimately an Elsevier product (Endeavor is a wholly-owned subsidiary of Elsevier Science), CSIRO felt it could build on its existing relationship with Elsevier Science as an Advanced Technology Partner in order to influence development paths and level of technical support, as well as pricing.

The ENCompass digital library solution consists of three products. Each product may be implemented separately or as a complete set depending on the library's needs.

LinkFinderPlus

The LinkFinderPlus product is a server solution designed to provide users with seamless linking ability from citations or abstracts to full text content.

Traditionally a library user was required to first search an indexing or abstracting database, then search for the source in the library catalogue, and then retrieve either the hardcopy from the shelf or link to the source at the title level via the 856 MaRC field.

The LinkFinderPlus solution presents the user with a clickable icon on a citation or abstract which will seamlessly link them to the full text source, or sources, at the article level eliminating the need to do a second search of the library catalogue and then navigate to the full text source in order to find the desired article.

The mechanism used to deliver and provide these links is known as OpenURL. LinkFinderPlus accepts the metadata from the OpenURL enabled source and uses this data to provide article level links to full text content. These links are determined, or switched on, by the administrator to ensure the user is only directed to full text content when it is available and at the same time eliminates the problem of dead links.

In addition to links to the full text the user is also provided with further administrator determined links for additional resource discovery. Known as Extended Services, this feature allows the user to conduct a further search using the metadata retrieved from the commercial database against resources such as local and remote catalogues and Internet search engines.

Links to full text targets are maintained in the Knowledge Base client. This includes a pre-populated database of links to full-text journals available from various sites. The administrator can switch these links on and specify the exact range of dates to which the user will have access.

ENCompass for Resource Access

As the number of electronic resources that a library provides access to grows, so does the number of different interfaces from varying sources a user must learn and attempt to navigate.

ENCompass for Resource Access provides a comprehensive solution to this dilemma by providing a single search engine interface for extensive federated searching as well as linking across electronic content regardless of where it may reside.

ENCompass for Resource Access is an XML based product which supports searching across five resource types:

Using these search types the library is able set up search connections with such resources as locally described digital collections, Abstract and Indexing databases, ejournals and ebooks at either the collection or title level, qualified websites, as well as local and remote library catalogues.

ENCompass for Digital Collections

The third component to the solution is ENCompass for Digital Collections, the aim of which is to provide support for creating, organising and managing an organisations own digital collections.

Collections of digital materials are store in databases, or in ENCompass speak, repositories. These repositories and the records within them are described using a locally defined metadata standard. The types of metadata ENCompass supports include Dublin Core, Encoded Dublin Core, Encoded Archival Description (EAD), MaRC, or any standard that has been locally defined.

Once a metadata record has been created, a digital resource, or object, may be attached. This object may be sourced as a URL web link, a file already residing on the ENCompass server, a file on a local server, or something currently in the PC's Windows clipboard.

Records are created, organised and maintained by staff authorised to use the ENCompass client. Previously created collections can also be bulk loaded into ENCompass using a metadata loader that is able to process records already in XML format. Several database software programs, including Microsoft Access already include the ability to export records in XML.

Currently approximately 75 file types are supported. These include numerous image and sound files, various standard document types and compressed files.

Implementation

Although CSIRO IT personnel have had a long history of installing and maintaining large UNIX based applications, our arrangement with Endeavor called for the on-site installation of ENCompass and LinkFinderPlus.

They were installed over the period 11-13 June 2002 on a Sun Fire 6800 with 4x750MHz CPUs and 8GB of RAM running the Solaris 8 operating system. This is the same environment that hosts our Voyager (Library Catalogue) and ScienceServer (Electronic Journals) system. The host system is connected to an external network storage environment (SAN) that provides a multi-terabyte storage solution to CSIRO's corporate IT and digital library applications.

Additional 3rd party software such as Apache, Oracle, Tomcat and Perl are also required to complete the installation.

In order to quickly become familiar with the products, on-site training with Endeavor was organised for the week following the installation. Being a hands-on workshop, numbers had to be restricted to the recently formed ENCompass Implementation Working Group, which would decide how the product was to be customised and deployed in the organisation. This group had members from CSIRO IT Services and the CSIRO library Network and included specialists in the area of IT, Cataloguing, Acquisitions, Voyager, and other library services.

The workshop covered the use of the administration client, staff client, and how the products would be customised for CSIRO.

Customisation

One of the first customisation decisions that ENCompass customers need to make is deciding which base metadata definition (ie DTD) is to be used. Some of the considerations taken into account when making this decision were:

Qualified Dublin Core was chosen over a number of other definitions that were supplied with the base software. It should be noted that once this decision has been made, it cannot (easily) be changed.

One of the primary uses of ENCompass in CSIRO will be its use as a one-stop-shop for information through the exploitation of the federated search interface. Decisions such as the definitions and layouts of the high-level 'Collections' and repositories (both internal and external) attached to these collections were also discussed. Although no final decision has been reached, 7 collections were defined using a broad subject classification scheme that covered CSIRO's research areas (see Figure 4). The collections were to be:

girke3
Figure 3 - Initial ENCompass Search Screen

External Repositories

The ENCompass package was supplied with a number of preconfigured external repository definitions. The process of using the ENCompass clients to incorporate these repositories into the predefined collections was a simple enough process, however the resulting federated searching mechanism, which sent search requests to the repositories, was at times disappointing as it failed to produce the desired results.

Local Repositories

At the time of writing, 3 local repositories have been developed:

The creation of these local repositories was probably one of the more complex areas of ENCompass and required specialised areas of expertise. The creation of these repositories is expanded upon:

Publications

CSIRO has a long history of scholarly publishing and has been recording its publications output for many years. Metadata relating to the organisations publications output is recorded in local ProCite databases, whose upkeep is the responsibility of designated Divisional Publications Officers. As metadata had already been recorded in the ProCite databases, it made sense to reuse this rather than reenter through the ENCompass client. As ENCompass had the ability to perform metadata loading via XML, a ProCite output style was written in order to export this data as XML. This was a relatively successful exercise, which saw the loading of several thousand records into the local repository. There were no objects associated with these records, however, many where able to link to full-text resources.

Photo Archive

The photographic repository was created in collaboration with the CSIRO Archivist who was looking for a home for many early slides taken by CSIRO photographers. A simple metadata scheme was developed to describe the negatives (Title, photographer, date taken etc). As with the publications metadata, it had been entered into ProCite, and was exported and loaded into ENCompass in a similar manner. Following the loading of the metadata, it was a simple process to upload each of the digital objects (ie. Images) associated with each of the records.

Directory of Tools

The Directory of Information Tools (DIT) was also exported and loaded into ENCompass through its metaload facility.

As a result of setting up the local repositories, several things became clear:

The conclusion to be made here is that defining and setting up local repositories in ENCompass will require close co-operation between IT specialists and the owner of the repository.

The Future

ENCompass and has still a long way to go towards becoming a stable and mature product - particular in the area of searching external repositories. We are working closely with Endeavor in order to rectify deficiencies and look forward to subsequent releases of the software to overcome these.

CSIRO, being a content and knowledge rich organisation, sees the product incorporating resources traditionally outside the realm of the library. Already we have worked with divisional scientists and have successfully imported into ENCompass existing databases of scientific information as local repositories. This has included an image collection, with associated metadata, of Eucalypts captured by our Forestry and Forestry Products Division.

The future should see full integration of our Electronic Journal Collection, the Voyager Library Catalogue, as well as internal and external repositories. The ideal is for it to act as the CSIRO scientist's one-stop-shop for information retrieval and resource discovery.