Google Search Appliance
January 11, 2007
Pilot a search capable of returning valid relevance results across all library assets including those “hidden” in databases. Google offers a tool called the Google Search Appliance which enables detailed indexing and searching of content based on the organization’s needs and wants. Currently, Google searches only what is directly visible within a few “clicks” of the organization’s homepage (e.g. www.byu.edu). Additionally, Google does not search any database requiring search terms to be submitted to an engine. The Google Search Appliance can be used to direct internal searches of an organization’s content, including content that is not visible either because it is behind a firewall or because it requires a search interface. The value of Google’s approach is that there is a single consistent index. Our current federated search returns inconsistent results from a variety of sources, which is the cause of some of the problems in Webfeat. The single consistent index permits better relevance results and faster response time. The object will be to determine whether this product is worth the expense.





Save your pennies for another day…
Try these on for size instead of the Google Search Appliance:
http://swish-e.org/
http://www.htdig.org/
Thanks. We’ll look into these. Have you used either of them? Any advice?
The google search appliance is only as good as the data you feed it. I set up a Google mini – which is a stripped-down version of a google search appliance, and it does have some nice tools.
Good luck getting a Google Search Appliance to parse MARC records and extract meaningful data.
What kind of searching are you trying to get it to consolidate? How do you expect it to score relevancy? Do you expect it to put item records from your library right up alongside web resources? How do you determine which book is more relevant?
The Google appliance does a good job at providing an easy-to-customize (via XSLT tinkering) search box that will catalog all your stuff. I know on the mini it wasn’t even possible to have it search anything that was protected with anything more than HTTP auth in the header – but I know the search appliance has more features for that kind of stuff. Don’t expect that part to be easy though.
The biggest downfall I see in the Search appliane is that the relevancy score is weighted so heavily on incoming links. When most of your pages at the bottom level have the same number of incoming links – it will fall back on metadata and page analysis.
It doesn’t have a magic wand to search your content for you. There is no such thing as a “magic relevancy” score that can compare results from a wide variety of sources. Somehow, one of those sources is likely going to be looked at as “more relevant” by the search engine just based on how the content is presented to the bot than another source, and that source’s information will come up much higher.
I finished up at BYU last year – but unless things have changed you still have a large and powerful Computer Science department that is full of talented students. Most of them are probably willing to work for pennies. Have them build you a Lucene-based search box that will do your dirty work for you. Heck – you could probably even get a professor to get kids to work on it for a class or something.
Buying a Google appliance is a good decision for someone for whom it is cheaper than developing one internally – or if you really want a cool-looking server to put in your datacenter. I don’t know if the Google appliances come wiht T-shirts, but the mini sure does
I guess the other question I have is – who provides your library software? If they can’t deliver a search interface that handles all your assets, why are you using them?
Can you imagine a world without search engines? Can you imagine doing any thesis work without being able to search online? I can, because I’m 56 but it was really difficult researching anything 20 or 30 years ago. And the time we used to waste going to the library and looking up the index! It blows my mind when I think of how it used to be. And now of course the Search Engines are even more powerful – maybe too powerful. It scares me the power that Google has. Off topic – but I’m talking about the new 2D barcodes on my site – now they’re interesting – whack up a tiny QR Code anywhere – scan it with an enabled cellphone and voila! – hyperlink to a site. Will this be the new graffiti? à bientôt, Lambe, Paris.