OOTB SharePoint search is quite powerful. The search engine of MOSS 2007 has only one physical index for each SSP in the farm. This implies that the content from all the content sources defined for the SSP is crawled into the same index. This provide the user’s ability to search across all the content using one query. But in many search scenario, it also calls for mechanisms to automatically narrow user queries to a logical group of content within the physical index. I will introduce two methods here today to achieve that.
Use Search Crawl rules
Search Crawl rules are mechanism to influencing the behaviour of the crawler when it crawls specific sites. A single crawl rule is created by specifying a URL wildcard matching sites plus a set of options for setting the behaviour of the crawler for these sites
For example: if you like all the document views and properties page to be excluded, you can use achieve it by configuring the crawl rule:
1. Go to the crawl rules section of the search setting in the SSP
2. Add crawls rules to exclude the following path:
*://*webflder.aspx*
*://*allitems.aspx*
*://*dispform.aspx*
You can also test a specific URL against the crawl rules to determine whether the rules will include or exclude the URL during a crawl. This feature is not available in SharePoint Server 2003.
In SharePoint 2007, wildcard operator “*” is the only operator supported in crawl rules foe matching everything. Because of its nature that matches everything, it does not have the flexibility to, for example, recognize and omit URL that contain mobile phone number.
SharePoint 2010 includes new capability in this area to support regular expression in the URL
Check Microsoft Enterprise Search blog for more details
Use Content Class in scope
Most of you already know how to use sharepoint custom scope to fine tune your search result. For those who don’t know, there are plenty materials on the internet.:) I only want to get your attention on one of the managed property, contentclass. Essentially, every piece of SharePoint content seems to be tagged with this property. And as long as you know the internal name and its corresponding mapping, you should be able to configure your sharepoint search scope quite efficiently. For example, if you want to return all the documents and pages, you can set up the scope like this:
Below is a list of content class and its mappings prepared by Dan Attis in his blog. Should give you enough information to get started.
case “STS_Web”: // Site
case “STS_List_850”: // Page Library
case “STS_ListItem_850”: // Page
case “STS_List_DocumentLibrary”: // Document Library
case “STS_ListItem_DocumentLibrary”: // Document Library Items
case “STS_List”: // Custom List
case “STS_ListItem”: // Custom List Item
case “STS_List_Links”: // Links List
case “STS_ListItem_Links”: // Links List Item
case “STS_List_Tasks”: // Tasks List
case “STS_ListItem_Tasks”: // Tasks List Item
case “STS_List_Events”: // Events List
case “STS_ListItem_Events”: // Events List Item
case “STS_List_Announcements”: // Announcements List
case “STS_List_Contacts”: // Contacts List
case “STS_ListItem_Contacts”: // Contacts List Item
case “STS_List_DiscussionBoard”: // Discussion List
case “STS_ListItem_DiscussionBoard”: // Discussion List Item
case “STS_List_IssueTracking”: // Issue Tracking List
case “STS_ListItem_IssueTracking”: // Issue Tracking List Item
case “STS_List_GanttTasks”: // Project Tasks List
case “STS_ListItem_GanttTasks”: // Project Tasks List Item
case “STS_List_Survey”: // Survey List
case “STS_ListItem_Survey”: // Survey List Item
case “STS_List_PictureLibrary”: // Picture Library
case “STS_ListItem_PictureLibrary”: // Picture Library Item
case “STS_List_WebPageLibrary”: // Web Page Library
case “STS_ListItem_WebPageLibrary”: // Web Page Library Item
case “STS_List_XMLForm”: // Form Library
case “STS_ListItem_XMLForm”: // Form Library Item
case “urn:content-class:SPSSearchQuery”: // Search Query
case “urn:content-class:SPSListing:News”: // News Listing
case “urn:content-class:SPSPeople”: // People
case “urn:content-classes:SPSCategory”: // Category
case “urn:content-classes:SPSListing”: // Listing
case “urn:content-classes:SPSPersonListing”:// Person Listing
case “urn:content-classes:SPSTextListing”: // Text Listing
case “urn:content-classes:SPSSiteListing”: // Site Listing
case “urn:content-classes:SPSSiteRegistry”: // Site Registry Listing