I’m really, really tired of people saying that taxonomies are dead, rigid, inflexible, etc, etc, insert your pejorative adjective here. I’ll go as far as to say that I believe that 90% of the people who say these things don’t understand what taxonomies are or how to effectively use them.
The typical premise is that tags make taxonomy obsolete. This is hooey. Sure, tags are really useful. Their particular power is that they are easy for an end-user to apply: just tag your resource with a few tags that make sense to you, and you’re done. This makes them a very cheap way to apply some loose structure to huge data sets. And they support browsing in interesting and really effective ways. Search for “street light” on Flickr, and you get a great collection of photos. It doesn’t really matter than some of them are of lamp posts and some of traffic lights. Or that some of the pictures tagged as “lamp post” may work equally well for your needs. Wanting to see all the lamp post pictures in one place and under one name feels kind of old fashioned and silly. It takes the fun out of it.
But if you’re looking for, say, all children’s services organizations in New York, it’s no longer cute and fun to have a third of these organization categorized as “children’s services”, a third as “child services” and a third as god knows what. Not to mention all those organizations tagged with more detailed children’s services - “children’s legal advocacy” or “foster care” – which can’t be identified or rolled up into children’s services. This is the power of taxonomy (or more technically, a “controlled vocabulary”) – it functions to map similar terms – synonyms, misspellings, etc – together and to allow people to roll terms up to more general ones when needed. Want to actually understand what a controlled vocabulary is? Check out Amy Warner’s great primer
Somehow “taxonomy” has recently become synonymous with “rigid” and “controlling”. But in fact when a taxonomy (or let’s just call it a “controlled vocabulary” to escape the pejoratives) is used well, it supports a tremendously flexible and user friendly environment. Check out www.gettyimages.com
. Try searching on “mom, happy, without dad.” If you’re really serious about finding the best picture for something, Getty supports your search in a way that a tagging system never can.
To get more geeky about it, in information science, the accuracy of a set of results is composed of two pieces: recall and precision. Recall means finding everything that applies to your search term. For instance, if you’re looking for legal precedents, recall is critical – you need to find everything that applies. Tags are by definition bad at recall, as different users will use different terms to describe the same thing. Precision is the flip side of the equation: finding only the most relevant stuff without a bunch of extraneous crap. Tags, well, they’re not great at this either. Likely other users have applied tags in a way that you wouldn’t use them, and you’ll need to look through a lot of irrelevant stuff to find the gold. A controlled vocabulary allows you to weigh precision vs. recall (generally accepted as opposite ends of a spectrum) and tailor the results to meet your users needs.
If you want to allow users to find both all organizations that offer any services for children and all children’s legal advocacy organizations, a well-constructed controlled vocabulary will support this in a way that user defined tags never can. Is a controlled vocabulary more complicated to create? Sure. Does solid categorization through taxonomy likely involve a human administrator and an update process over time? Yes. But greater cost and effort don’t invalidate the concept. While tags support interesting opportunistic browsing, taxonomies allow serious searchers to know they’ve found what they need.
And tags and taxonomies aren’t at each other’s throats, fighting to the death. They’re not conflicting opposites- there’s in fact huge power in using them together. What if end-users tag things, and the tags are mapped into a controlled vocabulary which is used to facilitate search and/or browse for people who want completeness in their searches? In most domains, I would expect that 80- 90% of tags could be predicted in advance, and could be designed into the taxonomy. A human could then map the orphan tags into the vocabulary scheme in an ongoing process, or update the vocabulary to accommodate it. In addition to making the tags much more useful to serious searchers, this would be a great way to test and refine the vocabulary over time.
Of course, a model like this doesn’t make any sense for sites like Flickr or Del.ci.ous, which are intended to support exploration rather than serious searching, and at a scale that makes human intervention infeasible. As Clay Shirkey points out in his article Ontology is Overrated
(excellent, but can we make a law that you can’t cite it unless you’ve read it through?), a controlled vocabulary scales poorly to a collection of resources as big and unstructured as, say, the internet.
Sometimes budget or resource limitations will mean that a controlled vocabulary isn’t feasible. But there are a whole lot of collections of resources that are way, way, way smaller than the entire internet or Flickr. For most of these sites, often filled with important and complex resources, a controlled vocabulary model would support users’ needs far better than a Flickr model.
As with most things, there are valid reasons to use either a tagging scheme or a controlled vocabulary. But jumping on the tag bandwagon for your site without carefully thinking about how a controlled vocabulary could help your users is either ignorant or lazy. It’s irresponsible design.