Search and research

February 9, 2009

Conference attendance generally leaves me with things to think about (that’d be a pretty positive outcome, I think).  Sometimes just disagreeing with people provides some clarity to one’s thinking and constructing a counter-argument helps you to think through your own ideas.

I was at a session on the “discovery layer” last week …  to be frank, I find that term objectionable to begin with, simply because it makes something relatively simple sound like it must involve a room full of NASA technicians.  In this case, the “discovery layer” is just library talk for a search interface that allows one to search through a number of databases simultaneously (so, you type something into a box, and a search is performed on, say, some journal indexes, some library catalogues, and some other stuff, but you never see all that and just get results).  You could call it “search” …

But this is not where I was going.  One of the things that one has to think about when designing the “discovery layer” is what it’s going to look like.  Will it just be a single box like google, or do you want to add all kinds of extra search options to slice your results up in more way (like by material type, or publication date range, or any criteria you can think of)?  To be frank, this is not a difficult decision for most designers of search.  Generally, one attempts to make the search process uncomplicated and hopes that the results returned withh be relevant and reasonably comprehensive. So, at Amazon, I usually just type some word in the box provided and press “Go”, or whatever.  A bunch of results come back, they are pretty much always at least close to what I was looking for, and I go from there (clicking links of recommended things, etc).  I might choose to drop down a menu and specify that I am searching for a DVD or something like that, but for the most, that’s totally unnecessary.  I know that I’ll get close or right on with the search box, so that’s all I worry about.

Libraries are, and since the introduction of the computer-based catalogue have always been, different.  No simple search box awaits you here, and even if you do encounter one, you are expected to specify what index you wish to search through (title, author, etc).  Then we get even crazier when you enter “advanced” search and begin to offer you all kinds of things, like publication dates, formats, locations, etc etc etc.   Get in to a journal index (which is entirely, and securely, separate from the rest of our indexes, and things get even more complicated.  Librarians have proudly learned how to use all of these tools, and spend a great deal of time guiding people through them to get what we call “precise” and “comprehensive” results.  That means that ideally we come up with a search  result that contains almost entirely relevant material, as well as every single related thing that there is to find (that’s ideal … never happens, of course).

 This seemed to make a great deal of sense to me about 20 years ago when I first started learning online searching. At that time we’d dial up some place in California and pay a great deal of money to search a big mainframe there and get results.  We’d actually sit down and plan our search so as not to waste too much time online.  I quite seriously believe that this is why we worry so much now about being precise and comprehensive.  Somewhere in the back of our minds we believe that if we miss something, or spend too long searching, our online charge will be astonomical.  Of course, in the mean time the world has passed us by and we can search as long as we want  at our leisure and just find stuff. We can browse a few pages of Google results looking for that one perfect thing that will springboard us into the motherload of good stuff … but, I digress a whole lot.

Something happened at the “DiscoveryLayer” session that set me off on this tangent.  While discussing whether the mother of all search boxes should be just a simple box for putting words into, or a series of menus and choices of a hundred ways to slice the search, one of the classic librarian discussions suddenly reared its head. People started suggesting that if we offered a single search box with a really spiffy mechanism behind it to automatically come up with your results without you making choices, that we wouldn’t be offering the user “everything”. There seemed to be this idea implied that we’d be “dumbing it down” or something (if I misrepresent someone here, I’m sorry, but you should consider this is a somewhat fictional account).  For a moment, I went along with the idea that we should, indeed, offer all kinds of search options and ways of refining searches, and those searches should bring back “everything” and give people “control”.  Then I stopped to think about it and raised my hand.     

Me: “Excuse me, but research isn’t about finding everything.  That’s not really how it works at all … this idea of a single search that brings back every relevant item is not like the process of research as it is actually practiced. Research involves finding a few, or even one, good thing, then checking what that one thing references, and checking those sources … then branching out rapidly from there.  Literature is not made up of a colection of items connected by having common keywords … a literature is composed of a collection of items connected by common ideas and a community of thinkers who are influenced by each other’s works. It’s about following the network of citations, really.”

Crowd : (nodding)

Me:  (now all excited)  “So, what search should do is get your foot in the door, not hand you “everything” … Google works because it gets you some stuff, not everything … and yes, it’s very true that its precision and comprehensiveness are limited, but it does what people need it to do, and does it easily.”  And I might have added … “this whole business is summed up with the old nugget, “it’s about the finding, not the searching (stupid)””.   

And that’s the deal.  Research, long before the advent of digital stuff, was about following trails of influence through the literature.  Heck, it’s even about finding the one call number on the shelf and browsing the books around that area.  It might even be so primitive as looking at the books on the shelf and noticing that some have been heavily thumbed, and some never touched.   The types of searches that we’ve been doing in libraries since the advent of the electronic catalogue have been strictly un-social and very much based on matching strings of characters.  Research is very much a social activity and not actually concerned at all with matching strings of characters.  Now that systems have matured to include social aspects of search (and Google’s page-rank is most certainly a social algorithm), all of this precision and comprehensiveness seems kind of silly.  Give me a single search box, and get my foot in the door with some key stuff in the area … get me started on introducing myself to the social circle made up of the people working on my topic …  and I’ll work the room, thankyou very much.

(You know, I thought about it this way … my father, as an academic trained in the 50’s, would find the concept very odd that one would do a search and be handed the list of all of the good stuff … that would be entirely foreign to how he finds material … he reads, looks at the shelf, and gets to know the community of scholars … a very different process).



  1. I hear you. A single search box to ‘start’ is ideal, make it quick and dirty. However, I really do think there needs to be alternatives for those that need more precision and comprehensiveness. The advanced features ought to be available, if not necessarily in your face.

    Because, in my mind, searching the web for ‘stuff’ is not the same as researching information for a scholarly work. Sometimes you need precision and comprehensiveness.

    • Well, I’m sure we’ll continue to have advanced search options and for certain tasks that will be handy. My real point, though, is about the nature of research, which follows a very different path from that type of searching that involves a lot of “advanced” options. If I were to design the ultimate advanced search it wouldn’t be about matching keywords, dates, formats, or descriptors. It would be about following lines of influence between researchers and their works. Citation analysis is a sort of blunt instrument for this type of work, but I’d want something with even more human factors. (I’d say ‘social’ but that would likely conjure up Facebook in people’s minds). That being said, the ultimate source for anything is talking to an expert and asking them to recommend what’s good … I’d take that over any search that I can imagine. Being an anti-social geek, I’d like the system to do the equivalent.

      • Another way of looking at it: Can you imagine the algorithm behind the single search box being good enough that precision and comprehensiveness were no longer a concern? So, not quick and dirty, but powerful and efficient? And I don’t mean do you think that you can build this, but rather can you conceive of that scenario? I (perhaps transparently) ask because I’m wondering if the principle is that you want control, or is that you need control? I never look for the advanced search on Google because it is quite good enough to always provide me with enough stuff that I want that I can’t recall ever wanting to refine that search… could a library-type research tool be that good? (and once again I mean this as a thought experiment)

  2. Yeah, I see that. Google works so well because they built, right from the start, to work that well. Everything within it is connected. Google uses these connections to produce results that work – it is popularity in action, and in a great many cases it works great.

    If you were to mean: can we harness this kind of linking for research, then I would agree with you. Can we read into papers and thesis and titles and find ways that these things link together, and make good search? Sure. I think it can and should be done.

    But even Google has its limitations, and when I’m using Google Scholar (which I do quite a lot of), I still end up creating rather complicated search strings, and probing more advanced features like: journal selection, date refinement, boolean search strings, etc. Perhaps this is because Google Scholar doesn’t have as advanced ‘algorithms’ behind it: I look forward to a simpler system.

    Until scholarly material: books, journals, articles, and everything else, gains the kind of social linking that web has, I don’t think the single search box will work.

    • Ah, well, you agree with me then … my real point is that the inherently social process of research is not well served by our very non-social, very literal string-matching tools in libraries. Discussing the merits (or lack thereof) of Google is another post that I don’t really want to write. I said the only thing that I really want to say about Google … that the Pagerank algorithm allows Google to claim to have a significantly social element to its search, as it claims that linking is based on a value judgment (unless, I suppose, one links to other documents randomly one links to things that you are suggesting someone else should look at). I would go out on a limb to say that the main reason that scholarly literature requires advanced search is because it has been structured in such a god-awful way in the world of licensing and under the control of vendors. The really awful thing about that is that there is a basic linking material already built into scholarly material that is not utilized … citations, and bibliographies. Well, they are utilized, but only by searchers reading them, and then finding that they often aren’t even directly linked to the source article (and my conspiracy minded brain suggests that is the case because that would require linking to material owned by, gasp, another vendor).

      Anyway, all I’m saying is that our search tools are big, blunt machines for sorting like-shaped objects into buckets, while research is like going to a cocktail party and chatting with scholars who have come before you about your favourite subject. At this point, the two don’t match up very well.

  3. You bet. It’s about the finding. Then what do you do once you have found it. Keep a copy of course. Sites come and data changes, so you should copy stuff. Search is so important I wrote my own. I use it to archive mpeg video, pictures and music. Favorite web sites and google search strings are retrieved and pasted into the browser or web search

    Random sampling of my data keeps it fresh.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: