online logo

Keeping up with search engines

Belinder Weaver belinda@journoz.com

http://journoz.com/

Where are we at with search engines? Last year, AllTheWeb (http://alltheweb.com/) briefly pipped Google (http://www.google.com/), pushing well past the two billion page mark and edging slightly ahead of the former search engine leader. It didn't last. Google has now gone to 3 billion pages plus and added new types of files to search, such as files in Microsoft Word format.

New tools came along. Teoma (http://www.teoma.com/), a Google-lookalike engine, is still quite new, while older engines such as AltaVista (http://www.altavista.com/) have reinvented themselves. AltaVista has launched Prisma, a clustering service for results that supposedly makes searching easier. Many search engine databases are being refreshed more quickly. I created a web log (or blog) for Australian journalists on a Friday and found it in Google the following Monday.

There are more and more specialty engines popping up. It's important to know about these as many do a better job of mining the information they index than a general search engine. For examples, there are search tools for images, for multimedia, for downloadable files and programs that you can FTP to your PC or Mac. There are tools for searching news headlines or for tracking down PDF files. There are tools such as Google Groups (http://groups.google.com/) for reading newsgroup postings and tools such as BoardReader (http://www.boardreader.com/) for finding postings to Internet message boards.

Some of these may not be relevant to the work you do, but it's good to know they exist, should you or your customers ever need to use them.

So there's a lot happening.

A few years ago, I was driving towards the university where I work and I felt a distinct knot of anxiety in my stomach. At the time, I was working primarily as an Internet trainer and I was thinking about a class I was preparing, and I remember, quite vividly, thinking: I wish I'd chosen to work at something easier - something that didn't keep changing all the time. You may know that feeling.

Keeping on top of the Internet seemed impossible then. It's still impossible, but one thing has changed, for me at least: I no longer worry about it. The Net is just too big. It changes all the time and much too quickly. Nobody is on top of it. So stop worrying!

Online skills are like parenthood - you don't have to be perfect - you just have to be good enough. Good enough is good enough. But how do you achieve that feeling of being good enough when information overload is a reality? In a world where search engines change all the time, where new features are unveiled constantly, where the number of pages indexed grows seemingly exponentially - how on earth do you get a grip?

Interestingly, the very tool that makes us feel overloaded - the internet - also puts you in touch with something that helps you cope with information overload - it puts you in touch with news - filtered news - that is relevant to the work you have to do, and it puts you in touch with the people who write that news. Whether you just read a newsletter by one of those people, or develop a more personal one-on-one relationship with him or her isn't important. Many, many people online can help you keep up to date. There will be news that is right for you, and there will be people doing similar jobs to you who have taken on the role of keeping like-minded people informed. I know because I'm one of them, but also because I use the work of those people to do what I do.

So use them - the good thing about the Net is the wheel has to be invented only once - once that's done, the communication structure on the Net makes sure everybody else knows about it.

One of the best people to get acquainted with is Washington's Gary Price. Price has recently published a book called The Invisible Web, co-authored by Chris Sherman who is speaking at this conference. Price has a number of useful websites, including DirectSearch (http://www.freepint.com/gary/direct.htm), a gateway to the invisible Web, and others such as the List of Lists (http://www.specialissues.com/lol), which provides best of lists and rankings, and NewsCenter (http://www.freepint.com/gary/newscenter.htm), which provides links to all kinds of news services such as wire, cable, TV, print and radio as well as streaming media from many of them. Where he is handy for search engine news is in his site, the Virtual Acquisition Shelf and News Desk - now more commonly known as the Resource Shelf (http://www.resourceshelf.com/). The site is really a web log for librarians. Price announces all kinds of new tools and services in this daily blog, many of which will be useful to librarians. He includes a lot of search engine news and announcements. A lot of what is there is too narrowly US-focused for Australian use, but you can skim it for the gems. You can get weekly highlights via an e-mail by signing up at the site. I use the weekly e-mail more as a reminder to visit the site as I find it easier to skim the blog online and click to where I need to go, rather than make notes about which e-mail bits sound interesting and then visit the site later. But it's up to you.

Regular newsletters such as the monthly Internet Resources Newsletter (http://www.hw.ac.uk/libWWW/irn/), the fortnightly FreePint (http://www.freepint.com/), or the weekly Scout Report (http://scout.cs.wisc.edu/) are also useful sources of news about new tools and services. The Scout Report has a section at the end of each newsletter about nifty new software, so look here for handy .exe files, many of which bolt on to your browser to enhance the way you surf.

The key site for keeping on top of search engine news is probably Search Engine Watch (http://www.searchenginewatch.com/). This site ranks engines, and also has tables and charts of features. Each year they run polls that declare the best overall tool, the best image searcher, the best meta search tool and so on. Google has won twice in a row, and for a good reason - it's the biggest, the best and keeps adding new features. At the site, you can also subscribe to a daily newsletter called SearchDay (http://www.searchenginewatch.com/searchday) - written by Chris Sherman. I find daily search engine news a little too much to cope with, so I got off the Search Day list not long after joining, but for people who need to be in the loop about searching, this newsletter is a godsend.

Greg Notess's site, Search Engine Showdown (http://www.notess.com/), is also a good source of news, tips and information about search tools, particularly things like search syntax. Greg also runs an e-mail list for people who want to receive announcements from him about search engine news.

Tara Calishain, weekly publisher of the ResearchBuzz (http://www.researchbuzz.com/) newsletter, is another net know-a-lot who is worth eavesdropping on. Her newsletter is occasionally a waste of time as the resources are too American or simply too left-field, but you can skim it for the cream, and she discusses tools from the point of view of the user trying to make sense of them, so her take on things is often more helpful than a straight description of features and options would be.

Sites like CNet (http://www.cnet.com/) that announce technology news are also places to watch. Even the mainstream press is starting to announce interesting items of search engine news. Last year, the Boston Globe interviewed Eric Schmidt, CEO of Google (http://www.google.com/), who talked up the company's plans for Google search. Basically, Schmidt's - or rather Google's - aim is to index everything - to include within Google search not just websites, PDFs, image files, newsgroups and so on, but also the content of proprietary databases such as LexisNexis. Obviously that kind of premium content would still be chargeable, but to be able to search within such databases with one search tool would be extremely handy and quick. ProFusion (http://www.profusion.com/) is one such tool - it allows you to search within many invisible Web databases such as patents and news sites, but it is not comprehensive, though worth a look.

You do not necessarily have to go to newspapers to find out what's happening - many of the engines are pretty good at selling themselves. If you want to know what's happening at your favourite engine - whether it's Google, AllTheWeb or AltaVista, visit the tool itself and look for news or what's new sections. The search engine sites tend to get all over-excited about whatever it is they're up to, so discount a lot of what they say as they're in the business of selling themselves To see exactly what they can do, visit the advanced search features section of each site. Here you will often find improvements - PDF or image searching, Boolean operators, language tools - that actually make searching a lot easier.

Some are a surprise. I did not actually know Google was indexing Word documents until I was searching for ePrint sites and found the minutes of a meeting that, among other things, listed 'to do' tasks for me! I almost wish I hadn't found it as it made more work for me to do, but it was interesting to see the new Google service.

In its capacity as search tool frontrunner, Google has a section specifically devoted to innovations still in beta test. Google Labs (http://labs.google.com/) is where you can try out new features, such as the recent viewer that replaced the traditional list display of results with a moving slide show of sites that matched search criteria. They also have Web quotes, which allows you to access third party descriptions of websites, and Google Research, a fee-based online Q@A service, which could help put librarians out of a job.

If you are interested in search tools that reside on your computer, such as WebFerret and Copernic, sites like AgentLand (http://www.agentland.com/) and BotSpot (http://botspot.com/) are the place to keep tabs on these. You can download software from sites such as these and then customise the tools to match your needs. Both sites include programmable agents that you can design yourself, but that may be too complicated for most people.

Where else can you get news? At the risk of sounding low-tech, I think you will find journals in your discipline a fruitful source of Web news. Maybe they won't deal a lot with search engines, but they may still come up with other useful tools that are very relevant to the work you have to do.

Which brings me to subject pages. These are also called directories, and many people can't distinguish between them - most people think Yahoo! is a search engine when it's really a directory. But these sites are often more useful than search because they do not respond blindly to words typed in, but gather and organise material on given topics. New ones emerge all the time, but a good watching point is Pinakes (http://www.hw.ac.uk/libWWW/irn/pinakes/pinakes.html). It lists the top 40 subject-specific pages online - in areas as diverse as social sciences, maritime studies and biotechnology - and also points to directories that cover multiple subjects, as well as directories of specific types of materials such as eprints and full text theses. Since new ones of these pop up all the time, Pinakes means you can keep tabs on a whole range of new search services that may deliver more than your search engine can.

Keep tabs on any tools that promise to open up the invisible Web. The most interesting and useful information is in there, rather than an the open Web indexed by search engines. Big as they are, the search engine databases only cover about 17 per cent of the material actually online which is why good invisible Web tools are so crucial.

It's important to keep up with search, but don't get too hung up about it. Use some of the services I have mentioned today to stay informed. They will definitely do the job. Maybe I am more relaxed about searching than I should be because I hardly ever search - I use a different approach to finding information online and find it works extremely well. I wrote my book, Catch the Wave, to explain in detail the approach I take. But that's another story.

Thank you for listening.

URLS used in the talk

Gary Price's websites

Newsletters

Search engine news

Build your own ...

Subject pages