Information Architecture: A Learner's Reflections: Week 2: What do spam filters and information seekers have in common?

This week’s reading: Morville & Rosenfeld, Chapters 3 and 4

In Reference 101 – known to USF’s SLIS students as “Introduction to Information Sources and Services” – I was taught that a patron’s inquiry could be broken down into elements called givens, wanteds, and modifiers. Givens were delimiters of the domain of interest: for example, a query might concern one-armed baseball players. Wanteds were what the patron sought: the name of such a baseball player. And modifiers were restrictions on the form of the output: it must be a webpage in English.

This week’s reading convinced me that the given-wanted-modifier model cannot cope with much real-world information seeking, at least not in a single application. The idea that all or even most patrons come to the reference desk with a fully realized question that can be answered completely and concisely – “his name was Pete Gray” – is a chimera. A great deal of information seeking cannot be phrased in the form of a question, and the information need is not “answered” so much as iteratively fed and refined until the seeker achieves a subjective satisfaction with the outcome.

There is something deeply secret and human about this vision of a subjective, iterated, fuzzy search for information. I’m instantly put in mind of Bayesian probability, which finds its most widely recognized use in email spam filters. Using logic believed to be humanlike, these filters read incoming email and assign a probability that the email is spam based on whether its characteristics – its words, formatting, origin, and so on – resemble those of known spam messages. Critically, the output of Bayesian filters is linked to its input; users and administrators identify the filter’s blown calls, and the filter adjusts its notion of “what spam looks like” based on its mistakes. After enough iterations, the filter arrives at heuristics of satisfactory accuracy. (My email filter has correctly classified each of the last five hundred messages I’ve received as of this writing.)

Why this discursion? Because one way of viewing an information seeker, a way I believe the text supports, is to see them as a well-developed Bayesian machine for identifying relevance. A middle school student who needs to write a two-page biography of George Washington may not be able to identify exactly what information she’s looking for, but given a choice among webpages entitled The George Washington University, George Washington’s Mount Vernon Estate, and The Life of George Washington, she will immediately identify the third as the most likely to be relevant. A more advanced searcher might gravitate towards websites with names like EnchantedLearning.com, which are likely to present highly relevant information in a format designed for students’ ready comprehension, or AmesLab.gov, whose .gov domain connotes cognitive authority. A good traditional search engine should support searching and browsing of results based on these characteristics. Analogously, the information architecture of a website should play to the Bayesian heuristics of the human mind; whether presented as one-word taxonomic labels or paragraph-long synopses, metadata needs to help the user take a quick glance and accurately judge whether the data will be relevant.

What is more, information architecture should support the iterated refinement of a user’s understanding of her own objective. When our middle schooler finds information about George Washington’s service in the French and Indian War, she will need to contextualize this new knowledge to determine its significance and its likely role in her paper. Her ideal history website might, for example, make a tooltip available that defined the French and Indian War in a single sentence – enough to assure her that this conflict must have been significant – as well as a hyperlink to more information, which would help her place the event chronologically in Washington’s life and outline his involvement in it in more detail. The tooltip pushes the information over the Bayesian significance threshold, and the hyperlink provides a natural avenue to continue the reunderstood seeking process. If one or the other is missing, the student may discard the information as irrelevant (in the first case) or obscure (in the second). Good architecture not only helps the user find “what she’s looking for,” but also helps her identify it.

Information Architecture: A Learner's Reflections

Monday, August 30, 2010

Week 2: What do spam filters and information seekers have in common?

No comments:

Post a Comment