Conference papers[ ALIA home | conference home | papers | photographs | search... ] |
![]() |
Focus addressSearch Engine Showdown
Greg Notess This paper compares and contrasts the current global Web search engines with a particular emphasis on advanced searching issues. It outlines some of the most recent changes in the search engines and the search engine industry, analyze how those changes affect searchers, and looks at changes in database size, economic models, and search features. In the past few years, the information content on the web and Internet use in general has grown dramatically. While e-mail is the most prevalent activity on the Internet in general, the second most common activity is usually considered to be searching. Search boxes are everywhere on the web and much valuable information content will only be found by using a search function. Many of the most frequently visited Web sites offer a general Web search engine as at least one of their major services. These general Web search engines continue to be available for free to all users, and they index hundreds of millions of Web pages. SearchEngineShowdown.com attempts to provide an independent comparison of the major search engines in terms of databases, technology, search features, display capabilities, and other issues of importance to searchers. This paper presents a few of the recently observed trends and changes among these search engines including growth in size, economic underpinnings, and search features. Database sizeThe past year has seen some significant growth in the size of the search engines' databases. In late 1999, the largest search engines (AltaVista, Fast, and Northern Light) included about 200 million Web pages in their databases. By June of 2000, several search engines had grown their databases to the 500 million mark and by the end of 2000, both Fast and Google were approaching 600 million records. Yet while the web is continuing a steady growth, the search engines that now offer larger databases seem to make sudden spurts in growth and then remain at about the same size for some period of time. In addition, some claimed sizes can be misleading. For example, Google has claimed an index of over one billion records, but half of those are not fully indexed. The extra 500 million are only URLs for pages which Google's spider has not yet indexed.[1] The rest of the database consists of fully indexed pages where all the words on the web page have been indexed. The vast majority of the other extra 500 million non-indexed URLs do not show up in most searches. Of those that do, some point to redirected pages or dead links. While this significant growth in size has helped to decrease the lack of duplication between the largest Web search engines, they all still fail to find all the web content available. Information available on the web that is pulled from databases via a search form, that is hidden from the search engine spiders by a robots.txt file or a meta no-robots tag, commercial information available only by subscription, and other sites that require a log-in are all types of information that is usually not directly accessible by the search engines. Economic angleIn the beginnings of the web, early search engines considered ways to charge searchers. Infoseek gave the first few records for free but wanted to charge for seeing the rest of the search results. With the advent of banner advertising, search engine companies quickly realized that by giving away Web searching for free, they could attract more users which then meant more income since banner ad payments were based on the number of users that saw the ads. Now that the Internet investment frenzy has abated and banner advertising revenues have declined, the long-term viability of the general search engines and the levels of service they provide is very much tied to their long-term economic viability. While many dot com companies have been able to live on venture capital funding in past years, now they need to start showing the ability to earn a profit. So where does that leave the search engines? They are looking for increased and new revenue streams and decreased costs. Now, in addition to the ads on their pages, the search engines have shopping partnerships, revenue sharing deals, and many sell their search technology. For example, the search engine technology that AltaVista uses on its general Web search sites also is sold to other companies and used at the search software at places such as Amazon.com. But the search engines are looking for new revenue models as well, such as Pay for Consideration, Pay for Inclusion, and Pay for Placement. The directories have introduced the concept of Pay for Consideration. LookSmart and Yahoo! now both require payment from sites before they are even considered for inclusion within the directory, at least for some portions of each directory. And they give no guarantee that those that pay will get included. In November of 2000, Inktomi introduced a Pay for Inclusion service.[2] With this new system, Web sites can pay, on a per page basis, to have the pages indexed by Inktomi within 48 hours. At this point, Inktomi still continues to have free submission as well as continued crawling. So the majority of the database is still built for free, but the impact of the Pay for Inclusion program will be seen within the year. Inktomi also states that sites that pay will be guaranteed inclusion but not any specific relevance ranking. AltaVista, also in November of 2000, has taken a different approach. Through a partnership with GoTo.com, they added a Pay for Positioning component. Unlike at GoTo's site, where the sites which bid the highest for specific keywords get top billing, AltaVista separates the GoTo results under a separate Sponsored Listings tab header. Economic implicationsThe rise in new revenue streams for the search engines is in part good news and bad news for the searcher community. On the positive side, the added revenue makes it much more likely that the search engines will continue to be available and remain free for searchers. On the negative side, these various strategies may make it more difficult to be found for the sites that do not pay. While there is no targeted campaign to keep out personal Web pages, government sites, academic institution content, and non-profit organizations, it will be imperative to watch over the course of the next few years to see what effects these pay programs have on the availability of non-paid information content. Will those that pay have fresher content, more pages included, or better ranking? Time will tell. Search featuresThe search features and advanced search techniques available from the search engines has seen some dramatic changes over the past few years. Initially, most of the search engines defaulted to an OR operation, promoted the + and - symbols for simplified Boolean searching, and used relevance ranking based on text frequency/inverse document frequency. Now, most of the major search engines default to an AND operation., the + does not always work, and ranking has gotten much more sophisticated. Some formerly prominent search engines have all but disappeared. The Open Text Index is no more. Both WebCrawler and Magellan were bought by Excite several years ago, no longer have active development, and have declining use. Excite itself made an effort at increasing its size, but has fallen behind the other major search engines. Go (formerly Infoseek) has had problems on several corporate levels and its search engine database has remained small. Meanwhile AltaVista, Fast, Google, Inktomi, and Northern Light have continued growing the sizes of their databases. AltaVista, Fast, Google, and HotBot have added customization capabilities. Google has added the OR operator and site clustering. iWon has rapidly gained an audience with its cash sweepstakes attractant, but it also has been one of the few Inktomi partners to utilize the entire Inktomi GEN3 database of 500 million Web pages. ConclusionSearch technology has become big business in the Internet economy and has seen many changes in the past few years. The pace of change is likely to continue. In part, it will provide searchers with greater abilities to find and manage Internet information resources. And yet it is quite likely that it will not be able to fully keep up with the information explosion on the Internet. Continued careful analysis of how the search engines work, what features they offer, how they build and maintain their databases, and what financial incentives drive them, will be key to a full understanding of how these information retrieval tools operate. References |
| http://conferences.alia.org.au/online2001/papers/focus.address.html © ALIA [ feedback | update | privacy ] . 6:10am 27 February 2010 |