Conference papers[ ALIA home | conference home | papers | photographs | search... ] |
![]() |
Designing for Retrieval ILinking: The State of Play TodayBette BrunelleVice president, Ovid Technologies Inc. 'Linking' has become the latest hot topic. Librarians cannot license enough full text from a single source, but realise there are real disadvantages to having full text reside on many different sites. One way to mitigate some of these disadvantages is to enable linking, from bibliographic databases to full text and also from references to remote full text sources. This presentation will look at the two basic technical models for linking, with each of their pros and cons, as well as some of the business issues, and how they are shaping the products and services becoming available. IntroductionJust as 'full text' was the mantra for 1999, 'linking' has become a hot topic for 2000. As librarians are discovering, it is not possible to license all desired full text from a single source, and yet there are real disadvantages to having full text reside on dozens, or even hundreds of disparate sites. Licensing, managing variable access conditions and terms, setting up access appropriate to each publisher and user, training users, and maintaining links to multiple publishers from a central HomePage, can quickly become an unmanageable task - particularly for large, complex information environments. Given this reality, ever-practical librarians and information specialists are looking for other ways to provide full text access to end-users. And beyond that, they are additionally looking for ways to inter-link a variety of information sources, and thus maximize investments in databases, OPACs, document delivery services and internal document collections. Because there is a tradition of using indexing and abstracting databases as an entrée to the professional literature, and given the model for linking presented by the World Wide Web, it is unsurprising that products linking bibliographic databases to full text would arise to fit this need. And arise they have - products, and potential products, are surfacing almost daily, in a variety that rivals the previous few years' full text offerings. Furthermore, key researchers and thinkers such as Herbert Van de Sompel or Clifford Lynch posit systems in which linking from bibliographic databases to full text, or from references to full text, is but one of many services that a comprehensive system should provide. With all this activity, it is sometimes difficult to determine what systems and technologies can provide today. In order to understand the advantages and disadvantages of linking solutions actually available, it would be helpful to understand the technologies that underlie them, what product issues arise from the types of technology - and what product issues are caused instead by logistical or contractual matters. And how do standards fit into the linking process? This paper will attempt to present a high-level overview of linking, with a consideration of what can and can not be accomplished with present-day technology and products, so that you can make informed decisions about what full text and linking product mix to bring to your institution. Aggregated full textBefore I get any further with linking, I would like to say a brief word about aggregated full text products - products in which many journals are licensed, and in the case of the best products, searched together. There are a couple of aggregated full text products providing searchable text with tightly integrated links between titles. These products are by far the most feature-rich full text products available on the market. In part this is because they do not have to deal with the logistical issues that can make distributed, linked full text difficult. They have the advantage of easy administration on your part, and provide a seamless, satisfying experience to your user. If you can find such a product with a collection of journals, which based on your institutions' needs, provides a good subject subset of titles, you will want to consider the product as part of your full text offering. In the parlance of linking, these products are often described as 'closed', although in point of fact some of them additionally provide open linking to a variety of external resources. The main problem with these products, which is addressed by distributed, linked products, is that at best they will provide only a subset of all the titles your users demand. Linking standardsThe very first thing to understand about the state of current linking products is that there are no standards which comprehensively address the issues of linking, even in the simple case of links from bibliographic databases to full text. There are certainly linking standards of various pedigree - the World Wide Web itself is a spectacularly successful example of a linking standard. The DOI (Digital Object Identifier) and its implementation in the CrossRef initiative, is a system, in very early stages, which posits a way to do some aspects of linking. Slinks (Scholarly Link Specification Framework) is a proposal to normalize a particular aspect of linking (the 'dialog' by which information is exchanged), and SFX is both a software and a 'framework' which describes a comprehensive linking system. There are in addition a number of standards for describing bibliographic data (or metadata), that can certainly facilitate the process of matching data in two different locations. To date, however, none of these ideas have actually become standard for linking. More to the point, none of them address all the basic parts of a linking transaction as it relates to the exchange of scholarly information: metadata description, link syntax, access validation, security, link types (what can the user expect from the target?), and error handling - never mind issues such as quality. The basic problem is that as with the z39.50 standard, the devil is in the details. The z39.50 standard describes how systems can exchange information about search syntax and database structure - but leaves both the syntax and structure wide open, and at the discretion of each site. As a result, various z39.50 sites have basically incompatible structures and syntax, and the communication between them gets 'dumbed down' to the lowest common denominator. With linking the various potential players - vendors, publishers, libraries, etc. - all have widely varying agendas, business models, and technologies, making for a very complex system - yet standards, to be successful, need to be simple. The Web, as a linking standard, succeeds because it ignores all the sticky issues that would make it complicated. A universal, scholarly linking standard, however, is not going to be widely adopted in an environment that ignores licensing rights, copyright, authentication, security, predictability or quality. On the Web, such issues are left to auxiliary programs written at the discretion of each site and in reality, in today's linking environment, such issues are also left to individual sites. The way that linking occurs today, even in systems that present themselves as 'standard', is that two organizations talk; probably make a contractual agreement; exchange information about technology; at least one partner in the transaction writes programs; and the other adds information to tables; then linking 'occurs.' It's not magic; it's not standard; and issues such as business model and quality are dealt with contractually, if at all. It is also labor-intensive, in much the same way that creating a HomePage with links to hundreds of journals is labor-intensive. The major advantage of products that have gone to all the trouble it takes to link is that the links can then be made widely available as 'product' - and each individual librarian at each site doesn't have to do the work. In the absence of standards for the basic parts of a link session, the user's experience of linking will be extremely varied. Much of the control over the user experience is at the target server's end. When a user presses on a link, whether it's a direct link from a site to a target, or a link processed through a database, what happens next can vary by publisher or by titles within publisher. The experience can even vary from minute to minute or day to day, depending on the status of the target server or of the linking database, when such a database is in use. The user experience of linking will be more thoroughly considered in a later section, but as you will understand by the end of this paper, the day of a seamless user experience via linking is not yet here. Preprocessed (static) vs. live (dynamic) linksIn one sense, all linking technologies and schemes are the same: at some point the source data (say, a bibliographic citation) has to be compared with the target data (say, a full text article) to find a match. Librarians are certainly familiar with all the ways that such a comparison can go wrong. Much of the profession has been concerned with exactly this issue in terms of library catalogs, union catalogs and databases. If, after years of concentration on this issue, we still have a world in which it's not possible to compare and de-dupe citations in two different bibliographic databases with complete reliability, then imagine what it is like to compare and link citations between dozens of commercial players - some of whom are new both to online and to databases. The actual point of comparison occurs in one of two basic ways: preprocessed or live. Preprocessed links are computed in a batch process and a link database is built. Records in the database describe the relationship of the source data to the target. Presumably, pre-processed links are more reliable than live links, because during the preprocessing it is determined that a link should exist, or at least that the item to be linked does in fact exist on the target server. In reality, unless links are individually checked, which is very costly, it is rather easy to end up with a link to almost the right thing - say, to the second letter-to-the-editor on a page full of letters on the same topic, and not to the correct, third letter. A real problem with pre-processed links is that they are time-consuming, and the database of links will always be out of date - especially if it is then distributed, as with the Silver Linker database or Ovid's locally-distributed internal links. Either the target or the source may add new records at any time, but until the next time the links are pre-processed and delivered, there will be no link for a user to select. Dynamic, or live links, make the comparison between source and target data on-the-fly without reference to a database. With dynamic links, the link is always there, and when the user presses on the link, it is activated. This overcomes the lag time mentioned with pre-processed links, but may lead to dead-end links, because there is no pre-processing step to verify that the item exists on both servers. That is why, for example, that sometimes when a publisher has pre-submitted citations to PubMed, the user will experience a 'dead' link during the lag period between submittal of the citation and actual electronic publication of the article. It is also the case that many publishers do not load letters, editorials, supplements, or meeting abstracts, yet such material is often indexed in bibliographic databases. With live links, a lot depends on the amount of customization work done and the reliability of the source and target information. Ovid's OpenLinks, which are dynamic links, often compare favorably with preprocessed links for reliability and of course for timeliness - but this is because there is a tremendous amount of labor-intensive customization involved in these links. Finally, live links also depend on a close working relationship between the target and the source - if anything changes on either end, the algorithm used to compute the live links will break. Potentially all links to a particular target could break at once. This is one of the reasons that linking is logistically complex rather than being just a technological problem. Before live linking can even be begun, a contractual arrangement between the entities is absolutely necessary to spell out the terms of what is truly a relationship. Contractual relationships are time-consuming and resource-intensive - which is why a good technological prototype is not necessarily a good predictor of a robust linking product. CrossRefSometimes links are both pre-processed and live, as with the CrossRef initiative. At the time that a user presses on a CrossRef link, the link is routed to the DOI handle system and the link occurs 'live.' However, whether or not to present a CrossRef link in the first place is determined by means of a preprocessed comparison of the source data (references) to potential targets in the CrossRef database. The original purpose of the CrossRef initiative was to provide a mechanism whereby individual publisher sites could link to one another's full text from references. The CrossRef database allows publishers both to register all their articles with a DOI and to look up DOIs that may correspond to the references in their full text. If such a DOI is found it can be embedded, as part of the editorial process, right into the reference. Thus links are guaranteed to exist on both ends, as with preprocessed links (although the links are not guaranteed to be correct - that depends on the quality of the data in this cooperative database), and are also executed 'on the fly' for good performance. Since the DOI Foundation presents itself as if it were indeed 'standard' in spite of the many publishers and vendors who do not participate, it is worthwhile to consider this linking scheme in a little more detail. For many publishers and bibliographic vendors, it is already quite possible to link directly without CrossRef. Indeed, a number of the 'CrossRef publishers', such as Academic and Springer already have extensive linking agreements with vendors who are linking directly and not using CrossRef. Still other publishers have no technical capability to link directly, and will therefore only be able to link via CrossRef. Critics of both the DOI Foundation and of the CrossRef Initiative (CrossRef is, if you will, an 'instance' of the DOI) are critical in regard to one major issue - that all CrossRef links lead to the publisher, even though the 'most appropriate' copy of the full text might reside elsewhere. The DOI Foundation, in fact, has responded to this issue by rolling out plans for an architecture that can link to more than one place, even link to local content -- sometime in the future. CrossRef, which is in such early stages that only a handful of the 35 'CrossRef publishers' have actually contributed to the CrossRef database, is for the foreseeable future certainly a one-way street directly to the publisher. It is also worth noting that each CrossRef publisher determines the link destination, which is not necessarily to the full text, but which could be to some interim page where advertising or promotional information appears. In the CrossRef database as it exists in October of 2000, the majority of links are not to full text, but to an abstract, even in cases where the record is marked 'full text'. This makes the CrossRef database somewhat problematic for bibliographic vendors, who already have the bibliographic records, and wish to enhance it with full text. For the user, CrossRef links will provide a mixed experience - not necessarily a link to full text. That said, the beauty of CrossRef, from the standpoint of a vendor like Ovid, is that once CrossRef is fully implemented, it will cut down on the number of relationships and technological implementations that must be maintained in the linking product. Eventually the 35+ participating publishers can all be linked to with one mechanism. Similarly, on the publisher's end, CrossRef has the potential to cut down on the numbers of vendors that must be individually contracted with and supported. CrossRef, however, will never be a full solution, both because of its' limited deployment, and because of its limited mandate. And as a point of comparison for its immediate impact, CrossRef went live with 1.3 million potential linkages in the CrossRef database, and plans to expand that to 3 million by year-end. This is a cooperative venture, at present with active participation of less than 10 publishers and eventually with active participation of less than 50, representing thousands of titles. In comparison, the Ovid Journals@Ovid.nospam (please remove '.nospam' from address) database, with only 400 journals, but from 70+ publishers, already includes more than 9 million links. Additionally, much of the CrossRef initiative relies on individual publishers to maintain the quality of the database. In the absence of any quality-control process, it remains to be seen how well this database will work. Granted, you have to walk before you can run - but let's be clear that with linking - particularly with big, cooperative linking schemes - we are only walking yet. LitLink and SFXSince we have covered bibliographic vendor linking products as a class (whether live or pre-processed), and DOI/CrossRef, it is probably worth taking a moment to mention two other linking schemes which are proposed as comprehensive linking solutions - LitLink and SFX. Both are products from companies, although SFX began life as a project under Herbert Van de Sompel at University of Ghent. The thing that is different about both of these schemes is that they purport to be universal solutions to linking. That is, both plan to link all citations from all vendor products to something 'appropriate' - perhaps a link to an OPAC; to document delivery; or to full text at a publisher site, at a vendor or in a private collection. In both prototypes the link goes to an intermediate server which then serves up to the user a selection of links appropriate to that user. The products suggest that they will provide customization similar to Ovid OpenLinks, and distinguish themselves by their offering of links to more than full text. The claim to more comprehensive linking because of links to document delivery or OPACs is somewhat deceptive. In fact, some of the vendors also offer direct links from articles to document delivery or to OPACs in addition to links to full text. It's really a matter of semantics whether the links are all offered as one product, or, as at Ovid, the non-full text links are offered free as part of normal software capabilities. What potentially does distinguish these products is the promise to provide all this service for every vendor - obviously Ovid does not offer OpenLinks from Dialog databases - or vice versa. I would caution you that, as in the case of CrossRef, these 'comprehensive' products are in very early stages. It is possible that over time they will actually work out all the contractual and technical details, but the logistical complications of working with all vendors, all OPACs, all document delivery systems, and all publishers makes these implementations very complex. Contracts can take literally years to accomplish. Vendors who already have linking products in place are not likely to simply give up revenue to these companies by LitLink- or SFX-enabling for no compensation - especially since linking is not really the labor-free process which many envision. The user experience of linkingRemember, what happens behind the scenes when you link from a source document to a target is at the control of the target. This is true with dynamic linking, with pre-processed links unless they are strictly internal, and with initiatives such as CrossRef, LitLink, or SFX. Even with Ovid OpenLinks, which always occur within the context of an Ovid frame to orient the user, it is still the case that some links go directly to full text and some go to an interim page wherein the user selects 'pdf' or 'html'. This behavior is at the control of the target. With products which are less customized than Ovid's, the links may be to a login screen for titles to which the user has no access, or sometimes to a dead link. Sometimes every new link opens a new browser, and the inattentive or naïve user will end up hopelessly trying the 'back' button, not noticing that there are now multiple browsers to close. If the target server is down - not an uncommon occurrence for servers put up by publishers very new to the online world -- a perfectly 'good' link may return 'server not found.' Some users adapt to this anything-goes environment - after all that's what the web is like. But some users become disoriented and expect more uniformity from something presented as a product. But this is the reality - linking today is not standard. You will want to try your linking products and full text products before you buy, and you will want to know what contracts are already in hand and what aspects are already in production. Finally, you will find that if you have an environment of any size and complexity, you will end up with more than one solution - count on it. SummaryIt is very exciting that there are many linking products, with links from bibliographic databases to literally thousands of journals from many different sources. However, it is also easy to infer some magic to linking where none exists, or to get confused between a standard and a good idea. You will make the best decisions if you maintain a realistic viewpoint and investigate options with a bit of skepticism. Try before you buy, and be sure that if what you are buying is a prototype, that you understand the long haul. References |
| http://conferences.alia.org.au/online2001/papers/designing.for.retrieval.ia.html © ALIA [ feedback | update | privacy ] . 6:10am 27 February 2010 |