Conference papers[ ALIA home | conference home | papers | photographs | search... ] |
![]() |
Designing for retrieval IICSIRO Online: Using XML to Introduce Structure and Efficiency to a Large Web SiteCynthia Love, Philip Kent and Kutira BandteCSIRO Information Technology Services, Clayton, Victoria AbstractThe CSIRO Online project undertook to update and restructure CSIRO's external web site. The aims of the project were to: present a comprehensive view of CSIRO's research to a variety of stakeholder groups; develop an infrastructure to aid retrieval and increase efficiency in the use of the information; and maintain consistency in the presentation of information about CSIRO's activities to assist users in navigation. The authors address various issues and implications for future development. They describe the advantages of converting data to XML and the use of XSL and CSS as well as the cultural issues in balancing distributed authorship and consistency in presentation; compliance with metadata standards and usability. IntroductionCSIRO has a network of web sites rather than a single site that serves the entire Organisation. A 'corporate' or umbrella web site (www.csiro.au) is managed centrally and there is at least one site for each of the Organisation's 20 research divisions. The CSIRO Online project attempts to overcome many of the problems that have arisen from the uncoordinated way in which the network of sites developed. Additionally it addresses the content management issues that have arisen in this large web site. CSIRO was an early adopter of web technology. Its first web site went live in 1994 and was regarded as experimental for many years. Consequently the delivery mechanism was not governed by any corporate protocols and guidelines. This resulted in a haphazard growth of web sites across the Organisation without any consistency, corporate image or mechanism for maintaining the currency of the information. The devolved environment vs. a centralised environmentWithin CSIRO there is always a natural tension in achieving a balance between corporate consistency and embracing the diversity that exists in a highly specialised organisation that covers a broad range of science. This tension is highlighted by the web. On the one hand CSIRO must present a coherent face to the world to achieve a usability of our information that is useful to our clients. However on the other hand CSIRO cannot corporatise all of its information without a loss of granularity. As CSIRO's web sites have evolved in an erratic fashion, a different style of presentation for nearly every site has emerged. Information was not described consistently across the Organisation and an individual could not find the same type of information on related sites easily. For example on one site the link to 'About this Division' could present an organisational chart of the Division and on another it would present a description of the research. This reduced the coherence of our image and therefore the presentation of CSIRO as one organisation. The challenge therefore was to find the common threads across the Organisation and to introduce structure and standards while leaving anomalous information alone. This was intended to eliminate the silo approach to organising information that had little relationship between paths and sites. The CSIRO site did not have effective search facilities and did not take advantage of metadata technology to enhance retrievability. CSIRO didn't have a thorough mechanism for determining 'use-by' dates on documents. As CSIRO was an early adopter of the web there were many instances where pages had been created to experiment with the technology. As there wasn't a protocol of responsibility for pages, pages fell into disrepair and became out of date when staff moved on. Automation offered an easy solution to this problem. The site also suffered from 'linkrot'. This means that there were a lot of broken links in our network of sites. This was also exacerbated by the problem above. CSIRO had a vast range of valuable information that was not easily retrieved by surfing or search engines. 'Hooks' into the wealth of information about our research were required so that CSIRO could capitalise on its investment. This meant making connections between descriptions of research, the formally published material about it and the associated records as well as the supplementary data such as databases of raw figures, specimens, notes etc. CSIRO is a public utility that has a mandate to provide information to a range of people. In addition CSIRO is required to obtain a certain percentage of funding from industry. The following complex group of clients must be catered for:
Each group has very different information needs and understandings of the CSIRO's research. Web site usability means identifying these client groups and their information needs and fulfilling it. Therefore CSIRO must present its research to 22 defined Industry Sectors. The reputation of CSIRO's research Divisions means that they must be easily identified particularly to the scientific community. Media releases present the latest information on CSIRO's research - at least one per day. Various programs present information to students and the general public. CSIRO Online was an initiative begun in 1998 to bring about a more coordinated and efficient approach to CSIRO's information on the web. It presents information in several navigation paths or windows. These correspond to our stakeholder groups. They also reflect CSIRO's structure as this in turn reflects the subject groupings of the science covered by the Organisation. The two fundamental navigation paths are:
Additionally there are paths to information for the media, students and the general public. Goals of CSIRO OnlineCapitalise on recognisable URL and bring information about our research to the foregroundPreviously CSIRO's corporate site (www.csiro.au) contained general information about the Organisation and then linked to other divisional servers that held the information about our research. This was not an efficient or holistic method of presenting information. As research is the most important aspect of CSIRO it should be presented in the most visible place. An advantage of being an original shareholder in AARNet means that CSIRO has it's own domain, thus making its URL very easy to remember: www.csiro.au. Therefore CSIRO could take advantage of this and place its most important information in the place where it is most easily accessed. Consistency and a CSIRO 'look and feel'In order to maximise the access to information CSIRO needed to introduce consistency into the presentation of its information. The commonly used 'units of information' were identified and a structure and common look to these documents was defined. The aim of this was to present a more coherent image of the Organisation to improve navigation through the information. Additionally the aim was to save time by defining the structure of a type of page once rather than have staff in the Divisions duplicating effort. Introduce efficiency through the reuse of information items in a databaseA goal of CSIRO Online was to make data entry and maintenance as streamlined as possible. This meant not expecting staff to re-enter the same data more than once. Consequently data such as contact details, locations and images could be stored separately and displayed dynamically. Greater precision in recall through the use of metadata.CSIRO is a Commonwealth government agency and as such must use metadata that is AGLS compliant. This was actually to CSIRO's advantage. In addition to the AGLS metadata set, CSIRO specific elements were implemented to manipulate the data with even greater precision. Improve navigation 4. How it WorksThe site operates on a Pentium II 400 MHz NT 4 server running Microsoft's IIS4. Heavy use is made of the XML tools that were delivered with IE5 - the parser, XSL and the DOM interface. The content is indexed using Microsoft's text search engine, "Index Server". The metadata is stored on Oracle 8, although any SQL/ODBC compliant database would be suitable. CSIRO Online is built around the following elements:
StructureThe 'backbone' of the system is formed by five entities (document types or schemas): Project, Issue, Sector, Program and Division. These in turn reflect the way CSIRO presents itself and its work to the world: by research Division, Program and Research Project; and by Industry Sector, Issue and Research Project. The diagram below illustrates the following:
PresentationMany documents refer to other documents. Rather than embedding information into the document, a document refers to it in another document. For example, contact information is stored in one document type ('Contact') and other document types refer to a specific contact document to get name, address, e-mail information. Similarly, information about images and locations is normalised by storing it once and referring to it from many places.
The diagram of a sector page below shows how a page is built:
Each sector has a homepage, which includes core information about a sector, stored in the sector schema. In addition the page will also refer to information from many other document types. Using XSL the page is then dynamically generated. Metadata
This facilitates the following:
Metadata is not simply a tool to improve access to information by the search engines crawling our site. For CSIRO it is a far more valuable tool to manipulate the data and improve the contextual presentation and the maintenance of the information. Identification of units of informationThe guiding factors were:
All research divisions have research programs, projects, staff to profile, information sheets, capabilities, achievements, media releases etc. Because one of the goals was to introduce consistency to the site, schemas were designed to define the structure of the document, use a common title and have a common graphical presentation. This assists the user in orientation within the site. Another goal was to make data entry and maintenance more streamlined. This meant not expecting staff to enter the same data more than once. Consequently data such as contact details, locations, and images are all stored separately and displayed dynamically. The staff member entering data selects the name, location or image from an index when entering data about a project, Division, media release etc. This has a significant advantage in the maintenance of the data. When a staff member's contact details change the data is altered once (in the contacts schema) and the changes are reflected throughout the system. It also means that if a staff member leaves and their details are deleted from the database the system will flag all the documents on which their name appears and the details cannot be deleted until this has been altered, thus preserving the referential integrity of the site. Distributed authorshipThere is a difference between the design and maintenance of the architecture and content control. The new system makes use of XSL and CSS and uses forms- based data entry. This means that we can control the architecture and 'look and feel' of the pages centrally to achieve a consistency while the experts in the information can control the actual content and its presentation from the Divisions across Australia. Data entry for the schemas that control the structure and presentation is by an online form prompting the author for information. This form requires authorised staff to allocate an Industry sector and Division, elements of content, images, contact details and any metadata that the system cannot automatically harvest. Implications for the futureFlexibility is required to accommodate diversity in information needs and allow for future growth and changes in direction. There are political needs to reflect organisational structures in the structure of the web site. We need flexibility to change the presentation and the information whenever there is a change in the structure of the Organisation. For example recently two new research divisions arose from the amalgamation of four old Divisions. This means that the information in the web needs to be rebadged and moved to reflect this new structure. The use of XML and metadata means that CSIRO can easily identify what information belongs to which Division and rename what is to remain very easily through the schemas. Ongoing structure is required in the database.While it is has been beneficial to identify units of information and develop a consistent style for their presentation, this is not always possible. Some documents do not fit a rigid structure. These fall into two types: the very general corporate documents that describe CSIRO, its history and structure and material from Divisions that is anomalous for example, a particular feature of a Division. This can be accommodated through blank schemas that still apply some structure in terms of ownership and placement in the site. Changing the landscape to promote a more holistic view of the web environment by our authors.Navigational structures have to be developed to allow users to travel between sites without ever being locked into a dead end. This establishes a base from which the presentation of more specialised material can be managed to its advantage. In this environment the use of frames is not appropriate. Knowledge management and science portalIn the process of collecting data about CSIRO's research and researchers the basis for both a science portal and a knowledge database had been established. Following development of a CSIRO e-print server scheduled to begin in 2001, links may be made between the research descriptions, the profiles of the scientists and the associated literature. By maintaining a distributed system with a central gateway, hooks can be put into the more specialised information that is held locally. In this respect CSIRO Online acts as a science portal and has the potential for achieving international significance. For this to fully develop progress must be coordinated and this challenge is cultural rather than technical. E-commerceUnder the direction of Dr Ron Sandland, Deputy Chief Executive, CSIRO established an E-Commerce Working Group in 2000. This is a corporate CSIRO project that is investigating the extent to which e-commerce can be made an integral part of CSIRO's core research business. To achieve this, the group has identified six demonstrator e-commerce projects, located in Divisions across CSIRO, and covering a broad range of activities and challenges. The group plans to bring together the experiences from these demonstrator projects into a set of recommendations, guidelines, draft policies, and a toolkit, that will be available for other CSIRO Divisions who wish to start up e-commerce activities. Links to these activities will be required from CSIRO Online as they become developed. However there are issues that must be addressed. The initiatives vary considerably in both the subject matter and the target audience; and access to these must be within these contexts as well as from an 'electronic shopfront'. The marketing of products online requires the same principles as the marketing of products in other environments. For example the following are in various stages of development:
As CSIRO's e-commerce initiatives are managed locally by the responsible division or unit they must be attached to the division's pages. There are various facets to each initiative:
Access must be provided through these paths to maximise the visibility of the products to the appropriate markets. A system with the structure on CSIRO Online can do this. They must also be integrated with information about products for sale via other mechanisms. Migration of data to new system.The use of standards and the implementation of a structure has meant that CSIRO has a data set that is easily migrated to new technologies as it implements them. This means that CSIRO is in a position to fully exploit its information resources as new technologies come online. ConclusionThe CSIRO Online project has introduced structure into a large mass of information about CSIRO and its research. The benefit of this has been the ability to manipulate data to present it in many different ways depending on the navigational path chosen by the user. It has also meant that greater efficiency has been introduced into the creation and re-use of items of information. The consequences of this are:
The system has been in operation for 18 months and a review is currently being undertaken to define both the needs of the Organisation as well as those of the stakeholder groups and to produce a strategy for future development. There are several significant factors for consideration. Any web site is organic and therefore should never be considered complete.All web sites continuously evolve to respond to their growth in information, changes in technology and in their markets. In this respect it is very similar to the collection development principles in a library. Relevance, flexibility and improved access are always paramount. Usability should guide any development of a web site.The purpose of a web site is to transfer information and/or products to specific markets. This means that designers of a web site must know the needs, habits and characteristics of these markets in order to complete the transfer effectively. Usability testing does not only involve technical testing but also the users' experience and enjoyment of the site. Can they find information they want? Does the site exceed their expectations? Does it excite them? Do they feel comfortable in the site or are they unsure of their orientation within it and feel frustrated in trying to navigate around it? These are all questions that should be answered in the course of usability testing. CSIRO's internal information should also be included in a strategy.Another key stakeholder for CSIRO Online is its own staff. Access to this information must be provided for them as well so that the Organisation can fully take advantage of the system as a knowledge management tool. Additionally CSIRO has an enormous intranet that would benefit from the structure that has been applied to its external information. Synergies can be obtained from utilising a single system for the management of all of our web information. A separate project to redevelop CSIRO's intranet commenced in 2000. As with any development such as this much of the change involves cultural change more than technical development.A major factor in the development of CSIRO Online was the cultural change that was involved in introducing a co-operative and co-ordinated approach to a network of web sites. This takes a long time to take hold and even after 18 months is still not complete. Rushing to judge the success of an initiative without allowing time for cultural adjustment and extensive consultation and support is a mistake. Running parallel to a system implementation must be an education and training programme to explain the rationale behind the change, the advantages of the changes, any requisite training and, very importantly, to gather feedback in order to embark on continuous improvement. This should never be underestimated. The saying 'Build it and they will come' is only half the story. One must build it, sell it, teach it, offer support in it and improve it. This is what really constitutes quality product fulfilment. CSIRO is a knowledge rich organisation. This project has applied greater structure and consistency to the organisation of CSIRO's information externally. It has enhanced CSIRO's ability to market itself as a single entity. The underpinning architecture provides flexibility and positions the site to respond to future challenges. Now that this framework is in place, CSIRO can proceed with further enhancements and implement a new graphical design to create a fresh and dynamic image. ReferencesNeilsen, Jakob 'Is navigation useful?' Jakob Neilsen's Alertbox January 9, 2000. http://www.useit.com/alertbox/20000109.html Rubin, Jeffrey Handbook of usability testing: How to plan, design, and conduct effective tests. New York : John Wiley, 1994. Spool, Jared Website usability: A designer's guide. San Francisco : Morgan Kauffmann, 1998 Tognazzini, Bruce. 'Elephants in the living room'. Ask Tog September 2000. http://www.asktog.com/columns/039Elephant.html Tufte, Edward R Visual explanations: Images and quantities, evidence and narrative. Cheshire : Graphics Press, 1997 |
| http://conferences.alia.org.au/online2001/papers/designing.for.retrieval.iib.html © ALIA [ feedback | update | privacy ] . 6:10am 27 February 2010 |