Today, data is considered a strategic asset. Companies are rushing to generate meaningful content from data.

Search in general has evolved as customers' need to find relevant content quickly increases. 

As websites are built, providing a search feature has become the norm. Customers are focused on a goal; they want relevant information within a fraction of a second.

The most common example of search is Google Search. You can read the basics of search, crawling and indexing here: https://www.google.com/search/howsearchworks/crawling-indexing/

Let’s move away from generic search and focus on Sitecore as a Content Management System (CMS) and its search capabilities.

If you are using Sitecore, you have the option of using Lucene or SOLR as the indexing mechanism for site search.

Sitecore comes with a default set of search indexes that along with your search and indexing provider, help to improve the performance of your website search.

When you configure Sitecore servers in a scalable environment, you first must decide whether you want to use Lucene or SOLR as your search and indexing provider. You then configure the indexes you need on each server.\

At a high-level: 

  • If you use Lucene, it is file-based. The sharing of indexes is not supported. Each server must maintain its own Lucene indexes.
  • If you use SOLR, the index storage is centralized and can be shared across multiple servers. 

The basic configuration of Lucene and SOLR can be obtained through the Sitecore knowledge base: 

Sitecore 8.1

https://doc.sitecore.net/sitecore_experience_platform/81/setting_up_and_maintaining/search_and_indexing

Sitecore 8.2

https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/search_and_indexing

Indexes required in a scalable Sitecore environment

https://doc.sitecore.net/sitecore_experience_platform/81/setting_up_and_maintaining/search_and_indexing/indexing/search_indexes_required_in_a_scalable_environment

You can also integrate Coveo, another search product, with Sitecore. Coveo for Sitecore: http://www.coveo.com/en/solutions/coveo-for-sitecore

We are going to skip all this wonderful information assuming we know the basics of hooking SOLR and Lucene with Sitecore as we explore further. 

The purpose of this two part blog is to look beyond the basics and list some Sitecore strategies to obtain meaningful search results using SOLR or Lucene.

Strategy 1: Crawlers and Crawler root – Configuration approach

You can customize the Sitecore index configuration and update the crawlers to crawl individual sections of the site in order to generate relevant search results.

For the purpose of this blog we took the example of sitecore_web_index

SOLR

File: Sitecore.ContentSearch.Solr.Index Web.config
<locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>web</Database>
                <Root>/sitecore</Root>
              </crawler>

Lucene

File: Sitecore.ContentSearch.Lucene.Index.Web.config
<locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>web</Database>
                <Root>/sitecore</Root>
              </crawler>
            </locations>

You can also create multiple crawlers for the same index pointing to different locations in the Sitecore content tree for more meaningful content types to be indexed.

Example either SOLR or Lucene for sitecore_web_index:

<locations hint="list:AddCrawler">
              <TreeCrawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>web</Database>
                <Root>/sitecore/Content/Home</Root>
              </TreeCrawler>
              <MediaCrawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>web</Database>
                <Root>/sitecore/Media Library</Root>
              </MediaCrawler>
   </locations>

Following are the advantages of having multiple crawlers in the same index:

  1. Improve Sitecore/application performance.
  2. Apply business rules directly in the query: such as pagination, page sorting.
  3. Easier to maintain.

Having one or multiple crawlers depends on business and technical requirements keeping performance and maintenance in mind.

Strategy 2: Selective indexing Strategy – Configuration approach

You can include/exclude templates and/or fields from getting indexed. The idea here is to selectively index relevant items for search providers to filter results. This can be achieved by excluding templates and fields that you do not want to index.

Example: Sitecore.ContentSearch.Solr.Index.Web.config

SOLR

<?xml version="1.0" encoding="utf-8" ?>

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">

  <sitecore>

    <contentSearch>

      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">

        <indexes hint="list:AddIndex">

          …

<configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration">

<documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">

     <indexAllFields>true</indexAllFields>         

                <!-- GLOBALLY EXCLUDE TEMPLATES FROM BEING INDEXED

               This setting allows you to exclude items that are based on specific templates from the index.

            -->

            <exclude hint="list:AddExcludedTemplate">

                                <js>{72B84DB6-F483-4F97-815F-D561E3AEC704}</js>

                                <css>{FAF0DED3-4F58-4C4B-B301-AEA0CF8CC5F1}</css>

<MicroSitePage>{915B3E93-D01D-4C06-8BB4-79FCE4D760F5}</MicroSitePage>

<Services_Page>{01D8C754-52ED-44EE-B36C-0E744A06E068}</ Services_Page>

</exclude>

<exclude hint="list:AddExcludedField">

<__Created>{25BED78C-4957-4165-998A-CA1B52F67497}</__Created>

  </exclude>                       

                </documentOptions>

                </configuration>

          <strategies hint="list:AddStrategy">…

            </strategies>

            <locations hint="list:AddCrawler">…          

            </locations>   …

      </configuration>

    </contentSearch>

  </sitecore>

</configuration>             

Lucene

You can also include and exclude templates/fields for Lucene.

Example: Sitecore.ContentSearch.Lucene.Index.Web.config

Format for inclusion and exclusion is similar to SOLR (file above).

Note: Use template name and template ID from Sitecore or Field Name and Field ID from Sitecore.


Read part 2: Strategies for creating a meaningful site search experience using Sitecore

comments powered by Disqus