How to Tune Microsoft Search Server Express, 2008, Etc.

Posted in Search by: Randy Woods on Wednesday February 27, 2008 at 6:24 pm

I had the opportunity to participate in Microsoft Ottawa’s ECM Days event yesterday. I gave a short presentation on the non-linear creations’ approach to tuning Microsoft search technologies – Microsoft Search Express 2008, Microsoft Search Server 2008, Microsoft Office Sharepoint Server 2007 (MOSS).

I won’t repeat the entire presentation here, but I thought a handful of observations are worth sharing. The graph below may look familiar. We found a similar shape when we examined the shape of search revealed by AOL’s ill-considered data release.

Shape of Search intranet

Short-tail (blue)– fewer than 100 search terms account for 36% of search volume
Long-tail – (green) 60% of search terms only occur once in the three month period
Mid-tail – (red) the range of searches between most and least common.

Not surprisingly, the shape of search inside the firewall does not differ dramatically from that outside the firewall. This graph looks at search behaviour on an intranet over a 3 month period – the vertical axis is the number of times a given terms was sought, the horizontal access is an ordinal ranking of terms from most popular to least popular. So, the most popular term was sought about 1000 times; the least popular (the green line to the right) were only sought once.

Why Does this Matter to Microsoft Search Tuning?

Good question. It matters because Microsoft provides tools for tuning search performance that allows you to deliver superior results for each.

Addressing the Short Tail: Using Best Bets

Microsoft search technologies allow you to define “Best Bets.” These are comparable to Key Word Matching features in the Google Search Appliance. In essence, they are a way to manually supplant the top search results returned with a recommended link. This is powerful. By grouping the most common search terms and defining best bets links for each of these terms, you can very quickly – and dramatically – improve user satisfaction with search results.

In the example above, one of 15 best bet links appear when any of the 100 most common searches are performed and this means 36% of searchers are likely to find what they need.

Addressing the Mid Tail: Leveraging Authority

Microsoft allows you to identify pages that you feel are particularly authoritative – and rank them as primary, secondary or tertiary. (You can also demote pages that you believe should not be considered authoritative).There appears to be a halo affect associated with this definition of authority. Documents or pages “close” to authoritative pages seem to be granted higher relevance in search results than more distantly associated documents or pages. By adjusting authority assignment you can change the broad landscape of search results and, with experimentation, significantly improve the relevance of search results for the broad mid-tail of searchers. (Subsequent posts will describe ideal sources of authority and ways of proving that you’re making search progress.)

Addressing the Long-Tail: Zero-result pages and Synonyms

The rare searches that make up the long tail tend to fall into one of two categories:

  • Deeply detailed searches with four or more terms entered by people who know specifically what they are seeking
  • What might charitably described as idiosyncratic spellings of more common search terms

You can safely ignore the first searches – they know what they want and will probably find it if it exists. But you should certainly help out the spelling-challenged in your company.

The Microsoft thesaurus is both powerful and a little intimidating. Casual users should probably keep hands well away from the keyboard while viewing it. But it does lend itself to programmatic updating (the topic of a future post).

To add synonyms, you need to edit an XML file, usually named tsenu.xml (for the English thesaurus.) The following is a snippet showing misspellings of Thibideau mapped to the correct spelling. If a user enters any of these terms, a search is run against all of the terms.

Our advice? Start with the standard report that shows the most common terms for which zero results are returned. Take the misspellings or mixed up acronyms and begin adding to the thesaurus. This should drive significant improved search experience for the long tail of search.

Questions or comments? Or real world experience wrestling with enterprise search? That’s what the comment fields are for.

Three Case Studies on the Importance of Monitoring Social Media (Blogs, Wikipedia, etc.)

Posted by: Randy Woods on Tuesday February 19, 2008 at 1:05 pm

As these three examples illustrate, you have a choice: take steps to know what people are saying about you online, or prepare yourself for a sickening sort of freefall when the blogosphere insists that you pay attention. Because sooner or later you, your company or your brand will become the subject of online conversation.

Three Reputation-Damaging Examples

In a previous post, I described how you might build a social media monitoring dashboard, simply and inexpensively. These three examples underscore the importance of this intelligence.

Uhm, You Know We’re a Client, Right?

We are very proud of the team at NLC – they’ve chosen to join us and we know they have many options in today’s environment. NLC has grown very quickly over the last three years and to support this growth we have made periodic use of recruiters.Imagine our surprise when an online job posting contained the following was flagged by our social media monitoring dashboard:

….a strong experience with web technologies. Ideally, we would like t hire some people who have worked for the same type of company as Non Linear Creations….

It was posted by the same company that we employ to recruit employees for us. To see them target our employees made us less than thrilled. Because we knew about this, a handful of angry phone calls resulted in the removal of the ad. Without our social media monitoring dashboard, we would never have found out.

Slander is a Term Worth Noting

NLC has had considerable dealing with RedDot CMS over the last eight years. We monitor online commentary on all the technologies with which we’re involved. While all products have their weaknesses, this blog– RedDot CMS & LiveServer: Reviews and Tips – is simply slanderous. The rants of this individual take fair observations about RedDot weaknesses and build them into slanderous, even silly, commentary. Comments titled “RedDot workflow doesn’t work” and “RedDot Upgrades from Hell” are inflammatory and unhelpful. Presumably RedDot (now OpenText, at least temporarily) is unaware of this blogger. An effective social media monitoring dashboard would:

  • Make them aware of the negative comments being made about RedDot CMS and LiveServer
  • Give them an opportunity to either react in the comments of the blog or take legal action to have it pulled off the web.

Wikipedia and the Flying Burrito

Wikipedia is a glorious resource and a shining example of the wisdom of crowds. Except when it isn’t.Export Development Canada – EDC – is a Canadian institution providing support to Canadian companies that export. Their Wikipedia entry is an important resource for potential customers assessing their services. But periodically over the last year, these potential customers have been informed that a Flying Burrito forms a key pillar of EDC offerings:

“In 1456 I rode a flying burrito with EDC‘s services and deal structuring capabilities helped to facilitate $60.6 billion in transactions to eat a pig on a stick like Christopher Columbus did in exploration. I ate with nearly 7,000 Canadian companies.”

A more politically-charged change was made earlier in the year:

“….EDC has maintained a partnership with French people are meaningless….”

Google returns Wikipedia as the fourth entry when a search for Export Development Corporation is performed. Knowing about these entries and correcting them is absolutely critical.

Bottom Line

If you don’t monitor the blogosphere, you will not know about negative comments until they turn into a tide too large for you to deal with quickly or easily. Thoughts or experiences? Add a comment or drop me a line.