Chris’s SharePoint Reflections

Just another WordPress.com weblog




  • Chris Zhong

    IT consultant Australia








Posts Tagged ‘Crawl rules’

Leverage Search Crawl rules and Content Class property to refine SharePoint Search result

Posted by chrissyz on April 18, 2010

OOTB SharePoint search is quite powerful. The search engine of MOSS 2007 has only one physical index for each SSP in the farm. This implies that the content from all the content sources defined for the SSP is crawled into the same index. This provide the user’s ability to search across all the content using one query. But  in many search scenario, it also calls for mechanisms to automatically narrow user queries to a logical group of content within the physical index. I will introduce two methods here today to achieve that.

Use Search Crawl rules

Search Crawl rules are mechanism to influencing the behaviour of the crawler when it crawls specific sites. A single crawl rule is created by specifying a URL wildcard matching sites plus a set of options for setting the behaviour of the crawler for these sites


 For example: if you like all the document views and properties page to be excluded, you can use achieve it by configuring the crawl rule:

1. Go to the crawl rules section of the search setting in the SSP

2. Add crawls rules to exclude the following path:

*://*webflder.aspx*

*://*allitems.aspx*

*://*dispform.aspx*

You can also test a specific URL against the crawl rules to determine whether the rules will include or exclude the URL during a crawl. This feature is not available in SharePoint Server 2003.

In SharePoint 2007, wildcard operator “*” is the only operator supported in crawl rules foe matching everything. Because of its  nature that matches everything, it does not have the flexibility to, for example, recognize and omit URL that contain mobile phone number.

SharePoint 2010 includes new capability in this area to support regular expression in the URL

Check Microsoft Enterprise Search blog for more details

Use Content Class in scope

Most of you already know how to use sharepoint custom scope to fine tune your search result. For those who don’t know, there are plenty materials on the internet.:) I only want to get your attention on one of the managed property, contentclass. Essentially, every piece of SharePoint content seems to be tagged with this property. And as long as you know the internal name and its corresponding mapping, you should be able to configure your sharepoint search scope quite efficiently. For example, if you want to return all the documents and pages, you can set up the scope like this:

Below is a list of content class and its mappings prepared by Dan Attis in his blog. Should give you enough information to get started.

        case “STS_Web”:                             // Site
        case “STS_List_850”:                        // Page Library
        case “STS_ListItem_850”:                    // Page
        case “STS_List_DocumentLibrary”:            // Document Library
        case “STS_ListItem_DocumentLibrary”:        // Document Library Items
        case “STS_List”:                            // Custom List
        case “STS_ListItem”:                        // Custom List Item
        case “STS_List_Links”:                      // Links List
        case “STS_ListItem_Links”:                  // Links List Item
        case “STS_List_Tasks”:                      // Tasks List
        case “STS_ListItem_Tasks”:                  // Tasks List Item
        case “STS_List_Events”:                     // Events List
        case “STS_ListItem_Events”:                 // Events List Item
        case “STS_List_Announcements”:              // Announcements List 
        case “STS_List_Contacts”:                   // Contacts List
        case “STS_ListItem_Contacts”:               // Contacts List Item
        case “STS_List_DiscussionBoard”:            // Discussion List
        case “STS_ListItem_DiscussionBoard”:        // Discussion List Item
        case “STS_List_IssueTracking”:              // Issue Tracking List
        case “STS_ListItem_IssueTracking”:          // Issue Tracking List Item
        case “STS_List_GanttTasks”:                 // Project Tasks List
        case “STS_ListItem_GanttTasks”:             // Project Tasks List Item
        case “STS_List_Survey”:                     // Survey List
        case “STS_ListItem_Survey”:                 // Survey List Item
        case “STS_List_PictureLibrary”:             // Picture Library
        case “STS_ListItem_PictureLibrary”:         // Picture Library Item
        case “STS_List_WebPageLibrary”:             // Web Page Library
        case “STS_ListItem_WebPageLibrary”:         // Web Page Library Item
        case “STS_List_XMLForm”:                    // Form Library
        case “STS_ListItem_XMLForm”:                // Form Library Item
        case “urn:content-class:SPSSearchQuery”:    // Search Query
        case “urn:content-class:SPSListing:News”:   // News Listing
        case “urn:content-class:SPSPeople”:         // People
        case “urn:content-classes:SPSCategory”:     // Category
        case “urn:content-classes:SPSListing”:      // Listing
        case “urn:content-classes:SPSPersonListing”:// Person Listing
        case “urn:content-classes:SPSTextListing”:  // Text Listing
        case “urn:content-classes:SPSSiteListing”:  // Site Listing
        case “urn:content-classes:SPSSiteRegistry”: // Site Registry Listing

Posted in Search | Tagged: , , | 2 Comments »