The Shape of Search and Our Analysis of the AOL Data Release

Posted in Online Marketing by: Randy Woods on Wednesday July 4, 2007 at 9:30 am

Earlier this month, we presented at the Gilbane Conference on Content Technologies for Government in Washington DC.and I want to thank those who attended. The Gilbane Group and Tony Byrne of CMS Watch organized the event and deserve plaudits for their efforts – I found the speakers and attendees knowledgeable and engaging.

On the second day of the conference I spoke on the intricacies of deploying web content management solutions that increase your profile at the major search engines (see www.nonlinearcreations.com/seo-cmsfor our whitepaper on the topic.)

What drove my presentation was our recent analysis of the shape of search.

Many of you may remember that in late 2006, AOL released three months of data on the behaviour of those using their search engine. This remarkable release gives us an opportunity to truly understand how people make use of search engines. (It turns out the data that AOL thought was anonymous, wasn’t, but that doesn’t diminish its value)

Search Has a Shape

20 million searches performed by 650,000 people is a lot of data. Making sense of it is a challenge and it is an ongoing effort. But our initial analysis is revealing.The following graph shows the topography of the search landscape.

shape of search

(Click to enlarge)

For the sake of manageability of data, I’ve grouped searches into one-tenth of 1% segments. The vertical access of this chart is the number of times a given phrase was sought; the horizontal access ranks searches in 1/10 of 1% intervals from most popular to least popular.At a glance it’s clear that a small number of popular searches account for most of the search volume. Some mathematically gifted people are able to think in terms of exponents and power laws. The rest of us have to make use of Excel. Replotting this graph on exponential axis produces an almost straight line:

exponential

(Click to enlarge)

This is a clear indication that searches follow a power law . In fact, Excel shows that the algorithm correlates with the data to a high degree of accuracy (r2= where perfect agreement is 1;. As I’ll discuss in a future post, I expect agreement would be even higher if we included all data points rather than grouping them in tenths of a percent.

The Take Away

The shape of search is a crazily steep curve. The first 1/10 of 1% of searches accounts for more than 20% of search volume. As we discuss in a future post on how people search, a great number of the other searches only occur once in the three month sample.

Discuss

Add Comment
 

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a Reply

Fields marked * are required