| Home | Day 1 |
| Previous paper | Day 2 |
| Next paper | Day 3 |
| Author index |
Cultural agencies such as museums and galleries are joining libraries in realising the importance of harnessing the explosively-developing web and database technologies to facilitate the exploration of their vast repositories of resources into the next decade. This paper promotes a standards-based approach to Web-enabled inter-operable information retrieval for cultural networks, using Z39.50 and the associated CIMI (Consortium for Interchange of Museum Information) profile. It presents Z+SQL, the adaptation of the Z39.50 protocol to the SQL domain. By uniting the advantages of the SQL query language with the inter-operable information retrieval services of Z39.50, Z+SQL presents the emerging SQL-enabled cultural communities with a powerful solution to their future distributed querying needs.
Todays information providers need sophisticated tools to keep up with the explosive growth of networked information available on the Internet. In this paper, we present a non-proprietary standards-based communications protocol, ANSI/ISO Z39.50, and its new SQL extension, Z+SQL. This combination provides an inter-operable access method to distributed, heterogeneous databases via the Internet. Z+SQL unites the advantages of the SQL query language with the inter-operable information retrieval services of Z39.50. Coupled with the existing Z39.50 profiles, Z+SQL facilitates both dynamic and inter-operable SQL querying and retrieval, making distributed querying with SQL across domain-specific communities a reality. In particular, this paper focuses on the benefits of Z+SQL to the cultural community.
We briefly explain the importance of having inter-operable information retrieval networks and promotes a standards-based approach. We consider the current trends in information retrieval, explore what the ideal information retrieval standard should look like, and then compare this to both existing and emerging international and industry standards. We briefly describe the underlying principles behind the Z39.50 protocol. We then explain the way in which the profile has opened up uniform access to cultural information, and discuss some of associated Australian and world-wide activities in the cultural inter-operability domain. Finally, the proposed SQL extension to Z39.50, Z+SQL, is described in some detail. Specific examples of how Z+SQL could be implemented within the cultural community under the CIMI Z39.50 profile are provided, and the benefits of doing so are discussed. In conclusion, we outline the current status of Z+SQL and associated projects, future extensions to the proposal and the planned release of SQL enabled Z39.50 products.
With the rapid advancement of both Web and database technologies, many organisations are realising the benefits of placing their existing information systems online so that their employees, external users, and other organisations can make use of them in new and creative ways. Compounding this realisation is the requirement that these resources must be accessible in a meaningful way both within and across various domains. The reality is that both technical and semantic inter-operability across resources is becoming paramount at a global level.
For example, many libraries around the world, each with their own independently developed database system, are now presenting their data through a uniform Web interface. As a result, cross-library searches for publications is now a reality. Government and scientific bodies, which generally have a number of disparate and autonomous information systems scattered between departments, are realising the benefits of allowing their existing data to be electronically queried and exchanged between departments. Cultural agencies are now also considering the benefits of making their information available online to both internal and external users in an inter-operable way.
These vast repositories of cultural collections and associated digital resources present many challenges to designers and builders of digital collections. Cultural and museum information includes a variety of physical and electronic object information descriptive records of physical artifacts designed for collection management, electronic derivatives of those artifacts (such as full-text documents and multi-media representations), online tools such as thesauri, authoritative lists of artists names, and more. Information retrieval tools for digital cultural collections need to be able to address the heterogeneous nature of the information objects, as well as the fact that such collections may now draw upon cultural repositories which are distributed around the world.
To facilitate information retrieval across such diverse collections of data resources as are now available, a non-proprietary standards-based communications protocol for information retrieval, which is independent of database and computer environment, appears to be essential. But what would the ideal information retrieval standard look like?

The complete resource discovery standard should provide both flexible and abstract services in order to support the heterogeneous nature of both the legacy and future data sources (Figure 1). Support for various data models and schemas as well as different query languages and retrieval formats would be ideal. The standard would also need to be dynamic, supporting both the discovery of information resources and the context of the data that these resources supply. Bundled up either within the standard or as associated components to the standard would be such services as resource management, persistence, authentication/security, transaction management and e-commerce. And perhaps, for ease of client-side deployment, thin web-based clients on existing browsers would be advantageous.
Within separate domains (namely bibliographic, commercial and the Internet), three main categories of information retrieval standards have been developed and widely deployed with several others emerging. They include:
In this section we briefly discuss each alternative, standard, data access mechanism, and in particular, discuss the suitability of each to provide online inter-operable access to heterogeneous SQL-enabled databases.
ANSI/NISO Z39.501995 (ISO 23950), is a client-server based network protocol widely deployed in the bibliographic domain and is concerned primarily with the search and retrieval of information within databases. One of the major advantages of using Z39.50 is that it enables uniform access to a large number of diverse and heterogeneous information sources irrespective of the computer system, search engine, or database.1
Open Database Connectivity (ODBC) is a proprietary-driven Application Programming Interface (API) originally developed by Microsoft for accessing SQL databases.2 ODBC is the first of the plug-and-play data access middleware, requiring an ODBC driver for each specific database system. ODBC has become the ubiquitous standard amongst the large relational database vendors.
JDBC is a complete definition of how to implement SQL-based database communication utilising Java and is now part of the core Java language definition and not a vendor-specific add-on package.
OLE DB is the new Microsoft standards-based strategy for accessing all types of data, regardless of type or location.3 Microsoft is implementing OLE DB as a common way to query, access and modify all data in the organization through a standard Component Object Model (COM) interface. OLE DB allows access to a range of data sources, including ODBC-compliant databases, spreadsheets, flat files and spatial databases. To deal with the lack of database services provided by non-ODBC data sources, OLE DB also includes query and cursor services. OLE-DB has not, however, become a ubiquitous standard (as yet).
The Remote Database Access Protocol, RDA,4 is an Application Layer standard that is intended to support general purpose database access in a client-server environment. While RDA is an open standard and provides true for SQL-based databases, it is not widely used, in comparison to the industry accepted ODBC API.
The Hypertext Transport Protocol, HTTP, is an application-level protocol for distributed, collaborative, hypermedia information systems. HTTP is a clientserver based network protocol which enables cross-platform, cross-enterprise, multimedia information exchange. HTTP dominates Internet traffic today and has evolved into a powerful distributed object system which is not just for simple retrieval but includes search and downloading Web pages and graphics, front-end update, and annotation.
The Hyper Text Markup Language, HTML, is the publishing language of the World Wide Web. It is an SGML application conforming to International Standard 8879 Standard Generalised Markup Language.5 HTML facilitates mark up documents by representing structural, presentational, and semantic information alongside content, using standardised tags. These documents are human-readable and machine-processible and can be then served, received, and processed on the Web in an inter-operable way.
Extensible Markup Language, XML, is a subset of the Standard Generalised Markup Language.5 More specifically, XML is an application profile of SGML designed to enable generic SGML to be served, received, and processed on the Web in much the same way as HTML is now. XML provides a mechanism to impose constraints on the storage layout and logical structure of documents. XML rests somewhere in between HTML and SGML. However, unlike HTML, XML allows authors to create and define their own elements. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures.6
The Resource Discovery Framework, RDF, is an application of XML which defines the infrastructure to enable encoding, exchange and machine processing of structured Web metadata. RDF imposes common conventions about semantics, syntax and structure to provide metadata inter-operability between different resource description communities. RDF provides a standard mechanism for both representing metadata semantics and publishing metadata vocabularies. RDF is currently under development.
Z39.50 has emerged out of the library community and addresses many of the envisaged standard requirements. However, until recently, Z39.50 has only been able to provide boolean-style querying. While this has been adequate for text searching, it does not take advantage of the increasing query power available in most database systems today.
In contrast, ODBC, JDBC, OLE DB and RDA have all emerged out of the commercial database community and use SQL as their standard database manipulation language. They each convey SQL statements (both retrievals and updates), and related control operations (including concurrency control and synchronisation), to the database management system. However, while they each deal with the problems of heterogeneous access methods, none deal with the problems associated with varying query languages, data schemas and semantics associated with heterogeneous databases.7
In parallel, HTTP/HTML and the developing RDF/XML have emerged out of the Internet community with the former addressing publishing standards on the Web and the latter re-engineering this approach to provide a robust and flexible framework for supporting metadata on the Web. Neither of these approaches, however, adequately addresses the needs of standardised data access to commercial databases.
Furthermore, neither the SQL nor the Web access methods have adequately considered the general requirements of providing or incorporating value-added auxiliary services (such as e-commerce), which would seem an essential requirement for Internet information services.
This paper, therefore, focuses on the standard which appears to best support the requirements of heterogeneous information retrieval namely ANSI/NISO Z39.50, and its new Z+SQL extension which provides a mechanism for distributed SQL information retrieval.
To facilitate information retrieval across SQL-enabled databases, a uniform access method is essential a method which deals with the problems of heterogeneous access methods, query languages and data semantics, while allowing each organisation to make independent decisions about their own database implementation, data security and local data requirements. Z39.50 is such a standard.
Z39.50 ANSI / NISO Z39.50-1995 (ISO 23950) is an international standards-based communications protocol for information retrieval, which is independent of database and computer environment. It is a client-server based network protocol and is concerned primarily with the search and retrieval of information across heterogeneous databases. The protocol specifies the formats and procedures governing the exchange of messages between a client and server, enabling the user to search remote databases, identify records meeting specified criteria and to retrieve some or all of these records.
Z39.50 enables uniform access to a large number of diverse and heterogeneous information sources, irrespective of the underlying computer system, search engine or database. It addresses the problems associated with semantic and structural heterogeneity in a practical way, achieving inter-operability between heterogeneous database systems through the standardisation of four main things:
Using these standardised components, Z39.50 clients and servers are each able to expose a standard interface, which allows them to communicate with each other independently of the underlying database implementation. An overview of the Z39.50 search and retrieval mechanism is shown in Figure 2. This diagram illustrates the fact that Z39.50 presents an abstract view of information stored in a variety of database types, and then uses the concept of Access and Retrieval Points for searching and presenting (respectively) these records to the user. For a more detailed description of Z39.50, refer to ANSI/NISO Z39.509 and Finnigan and Ward.10

Z39.50 has a number of advantages over other information retrieval approaches. Firstly, it is a standards-based approach which has been accepted in a wide range of information communities. Originally developed by the library community, it is the culmination of some twenty years of research and development to the stage where it is reliable and can be considered well tested through substantial deployment experience.11 Z39.50 is currently supported by many library catalogues, archives, geo-spatial catalogues, abstract and index databases and Web based resources, and is also becoming a mandatory access protocol for large corporate data stores such as the US Federal government agencies. Secondly, the protocol is independent of the software and hardware platforms on which the local databases are implemented and independent of the local database interfaces, query languages and data schemas. In this way, Z39.50 can be used to allow standardised, dynamic access to existing database systems, without any prior knowledge of the database implementations and without making any changes to the local database implementations. Finally, Z39.50 does not directly expose the source databases, providing a security layer for open environments such as the Internet.
Z39.50s CIMI Profile has now opened up uniform access to cultural information, which may be stored as full-text documents, bibliographic data, images, movie clips, sound bytes or multimedia files. The CIMI Profile specifies a subset of Z39.50 features, options, parameters and detailed data semantics needed to support the functional and user requirements for the search and retrieval of cultural information in digital collections.
The CIMI Profile consists of three basic components that address:
Specifications that allow a client and a server to share an understanding of access points available for searching databases containing, for example, museum object records, images with associated text, and bibliographic records. This is accomplished by specifying a standard list of access points (represented by the CIMI-1 Attribute Set), along with semantics for those access points. The client and the server share an understanding of this standard list. The functionality provided by these specifications enables a client to express searches on specific concepts (e.g. the title of an object, the provenance of an object, the material or medium of the object) in a standard way that can be understood by a server.
Specifications that allow a client and a server to share an understanding of database records for retrieving the entire record or specific units of information (e.g. one or more groups of database fields). This is accomplished by specifying a standard list of elements in the Abstract Record Structure for the Retrieval Record along with semantics for those elements. The client and the server share an understanding of this standard list. The functionality provided by these specifications enables a client to ask for groups of elements and enables the server to deliver those elements and label each element in a standard way for client processing.
Specifications that allow a client and a server to exchange a record in an understandable and processible format. Z39.50 calls these formats record syntaxes, and discusses the required and optional syntaxes.
For a more detailed description, refer to The CIMI Profile12 and Moen.13
The Profile was developed by the CIMI Working Group, which consists of Z39.50 experts, experts in museum systems and information resources, software developers, and commercial vendors. The specifications included in the CIMI Profile reflect the consensus of this group, input from a range of stakeholders, and practical implementation experience through the 1997 CIMI Z39.50 Inter-operability Testbed. The next stage of this testbed is planned to further address issues such as semantic inter-operability, cross-domain searching and broadcast searching across disparate museum collections.
One of the associated CIMI activities of particular interest is the CIMI Dublin Core Metadata Testbed. Worldwide, there has been a surge of interest in metadata, much of it focused around an initiative called Dublin Core. Dublin Core (DC) is intended to facilitate the discovery of electronic resources and author-generated descriptions of Web resources, and to provide a way to describe heterogeneous information resources. This CIMI DC Metadata Testbed project aims to explore the inter-operability of DC metadata for museums, to explore assumptions made about the DC within a museum context and to make the findings available to the museum and wider Dublin Core community. For more details see CIMI DC Testbed.14
Currently within Australia, both federal and state initiatives are being developed to
promote wider public access to collections, activities and events in cultural
institutions. One such project, Zavier A Z39.50 Arts Victoria
Inter-operability Pilot Project, has been initiated by the Victorian Cultural
Organisations Metadata and Database Inter-operability Group (COMDIG) and funded by Arts
Victoria, Multimedia Victoria, and the New Technologies Working Party (NTWP a
Commonwealth, State and Territory working party of the Australian Cultural Ministers
Council). The aim of the Zavier project is to determine, via the implementation of a pilot
system, the full potential of Z39.50 and the CIMI profile to deliver uniform and seamless
online access to a diverse range of distributed cultural sector databases. Zavier is a
collaborative endeavour by the Museum Victoria, National Gallery of Victoria, State
Library of Victoria, Public Record Office and the Victorian Arts Centre Trust through its
Performing Arts Museum. Figure 3 shows the overall architectural design of the project.
Until recently, one of the weaknesses with Z39.50 has been that it has only provided boolean-style query languages (e.g. type-1 and type-101). While this has been adequate for text searching, it does not take advantage of the increasing query power available in most database systems today. For this reason, Z39.50 is being enhanced with SQL support namely Z+SQL. Z+SQL adapts the Z39.50 protocol to an open SQL environment facilitating distributed querying and retrieval over SQL-enabled databases. It unites the advantages of SQLs query language and generic export syntax with the inter-operable information retrieval services of Z39.50. Z+SQL proposes SQL extensions to the ANSI/NISO Z39.50-1995 for inclusion into Version 3 of the standard.
With respect to searching, Z+SQL adds a type-SQL query (SQL-89, SQL-92 and SQL-3) to enable more complex querying techniques than previously offered by Z39.50 such as aggregate functions, group bys, joins and nested-queries.15 The combination of Z39.50 and SQL also benefits the SQL community by providing a standard communications protocol for dynamic Intranet and Internet accessibility to undefined SQL databases, as well as providing a standardised schema mechanism to enable common semantics for both inter-operable and broadcast SQL querying.
The type-SQL query provides inter-operable SQL across the Z39.50 wire in one of two ways:
With respect to retrieval, a new record syntax (SQL-RS) has been introduced. SQL-RS is a generic record syntax requiring no prior knowledge of the result set and which may contain fields generated on-the-fly, such as aggregations or numeric calculations. SQL-RS incorporates SQL-92 datatypes with the planned SQL3 datatypes. Rather than individually tagging each field with its datatype, SQL-RS first presents the data structure definitions, followed by multiple sequences (rows) of data. The result set returned using a SQL-RS record is structured data, which may be either directly displayed or restructured at the client side for analysis purposes.
As Z+SQL is an extension of Z39.50, the additional information retrieval services afforded to the Z39.50 protocol are also supported e.g. access control, resource control, extended services such as persistent results and queries, periodic querying, export and document ordering, database updating, and the help/explain facility.
Z+SQL uses the client-server based architecture of Z39.50. The following is a typical example of how this architecture can be implemented.
When implementing a Z39.50 client which supports Z+SQL there are usually three main, logical components:
and a method for displaying the resulting records.
An overview of this architecture for Z39.50 clients is presented in Figure 4.

A Z39.50 server which supports Z+SQL would normally contain 4 main, logical components:
Once the incoming SQL query has been passed from the Z39.50 Target to the Z39.50-to-database-interface, it is then passed to the two query transformation processes. The Z39.50-to-database-interface then passes the resulting query to the local database, using an appropriate database API (such as ODBC or OCI). When the query has finished executing, the Z39.50-to-database-interface then translates the results into the client-negotiated record syntax and passes these results back to the Z39.50 target. The target then ships the response back to the Z39.50 client.
The generic architecture of a Z39.50 server which supports Z+SQL is shown in Figure 5.

To the Z39.50 user, Z+SQL provides the full flexibility and query power of SQL. Z+SQL clients are able to specify complex queries, by using either SQL or one of its derivatives, such as Query-by-Example (QBE). Queries can be formulated on either a single table or on multiple tables, using Cartesian products, unions, intersections, joins on matching columns, and projections on specified columns. Queries can also be formulated using powerful constructs for expressing conditions, performing aggregate and comparison operations, partitioning tables into groups and much more.
For example, Z+SQL provides the museum community with more sophisticated querying based over a common semantic understanding, allowing queries such as:
Give me all available details about the oldest stamp in your collection.
How many Roman coins are at each of your collections sites?
To the SQL user, Z+SQL provides a full range of inter-operable information retrieval services, with shared content semantics for specialist communities, thus making distributed searching a reality. Z+SQL offers secure Internet access to SQL-enabled databases, and in particular it offers an inter-operable mechanism for searching and retrieving from SQL3-enabled databases.
For example, a curator of a museum may have an upcoming exhibition and needs to find what artifacts are available for display from both his internal museums disciplines and from other museums and related organisations (e.g. art galleries or government archives) around Australia. These museum resources may be digitally stored or catalogued on structured SQL-enabled databases, each containing independent data from possibly different museum disciplines. The curator would need the artifacts descriptive details as well as related details such as current location, condition, display restriction, policy for loaning such as cost, delivery etc in correlated report formats for easy assimilation. The curator would then select the items required and may wish to request progressive updates on the inter-museum loan requests and to reconfigure exhibition preparation schedule accordingly. All this is possible with Z+SQL.
The Z+SQL proposal was initially presented to the ZIG (Z39.50 Implementors Group) community at the October 1996 ZIG meeting in Brussels. Working Group meetings have subsequently been held in Washington DC in April 1997, Copenhagen in August 1997, and Orlando in January 1998 each held in conjunction with a ZIG meeting. The Z+SQL Issue 1 was released in April 1998.15
Prior to Z+SQLs general inclusion into the Z39.50 standard, three stages of testing will occur:
The proof-of-concept Z+SQL prototype, called Zinc+, supporting both dynamic and abstract SQL querying, is currently available for demonstration from the Distributed Systems Technology Centre, University of Queensland.
The international call for testbed participation across the ZIG community is scheduled for November 1998, with commencement of the open testbed scheduled for January 1999.
The ZedMov pilot, developed as the internal Z+SQL testbed, consists of two phases. The first phase, as shown in Figure 6, tests a single client-server application of Z+SQL. Both dynamic and abstract (using the CIMI profile) SQL queries will be tested, with records being returned in either SUTRS or SQL-RS. Phase I was completed and demonstrated at the Madrid ZIG meeting in October, 1998.

Phase II of ZedMoV, as shown in Figure 7, will then test distributed Z+SQL querying, using Abstract SQL queries across the CIMI profile and returning records via SQL-RS. A single Web-to-Z39.50 gateway will query across five SQL-enabled databases. A revised SQL-RS aligned to the SQL3 standard, which is expected to stabilise in November 1998, will also be tested in this Phase. Phase II is expected to be complete in early 1999.

Currently, the Z39.50 Biological Implementers Group (ZBIG) are assessing Z+SQL as a solution for resource discovery across biological databases in conjunction with the CIMI profile. It is envisaged that the international testbed (Stage 3 of Z+SQL testing) may focus on this community.
With the stabilisation of SQL3,16 it is envisaged that the next stage of the Z+SQL proposal will incorporate standardisation of functions within specialised profiles (along much the same line as standard schemas), and, in particular, to align with such standards as SQL/MM,17 OpenGIS,18 OpenEDI,19 etc. And indeed, on the dynamic SQL side there is nothing to stop SQL3 function calls being used right now.
DSTC Researchers in Australia have joined forces with Crossnet Systems Limited of the UK to develop a SQL module for their Z39.50 toolkit ZedSQL. ZedSQL is the result of a collaborative commitment to create sophisticated software technology to facilitate distributed searching using SQL and is based on the latest SQL extension to the Z39.50 standard.
ZedSQL is one of a set of modules, which will extend the functionality of ZedKit, Crossnets Z39.50 Version 3 toolkit, and will include the following tools:
The ZedSQL product will offer both support for RPN (type-1, type-101) and SQL (type-SQL) queries, as well as returning records in both structured (GRS-1, SQL-RS) and unstructured (SUTRS) formats.
Todays information providers need sophisticated tools to keep up with the explosive growth of networked information. Tools to make database resources available via the Internet, however, are hampered by the prevalence of heterogeneous database systems, heterogeneous query languages and heterogeneous data semantics. What is needed is a uniform method for accessing these databases.
To this end, ANSI/NISO Z39.50, an open standard for information retrieval, is a practical and useful tool. Z39.50 is a computer-to-computer communications protocol designed to support searching and retrieval of full-text documents, bibliographic data, spatial data, images, multimedia in a distributed network environment.
Z+SQL provides an enhancement to the Z39.50 protocol, bringing additional querying and retrieval power to existing Z39.50 implementers, as well as providing a unique facility for distributing SQL queries to heterogeneous databases worldwide. Based on a clientserver architecture and operating over the Internet, the Z+SQL protocol provides dynamic accessibility to dynamically discovered databases and provides a standardised schema mechanism to enable both inter-operable and broadcast SQL querying. Z+SQL serves as a salient tool to support the searching demands of the increasing number of Internet-accessible cultural database applications of the emerging information age.