To totally unlock this section you need to Log-in
Login
Crawling content is the process by which a system accesses and parses content and its properties, sometimes called metadata, to build a content index from which search queries can be served.
The result of successfully crawling content is that the individual files or pieces of content that you want to make available to search queries are accessed and read by the crawler. The keywords and metadata for those files are stored in the content index, sometimes called the index. The index consists of the keywords that are stored in the file system of the index server and the metadata that is stored in the search database.
The system maintains a mapping between the keywords, the metadata associated with the individual pieces of content, and the URL of the source from which the content was crawled.
Note: The crawler does not change the files on the host servers. Instead, the files on the host servers are accessed and read, and the text and metadata for those files are sent to the index server to be indexed. However, because the crawler reads the content on the host server, some servers that host certain sources of content might update the last accessed date on files that have been crawled.
Determining when to crawl content
After a server farm has been deployed and running for some time, a search services administrator typically must change the crawl schedule. This might need to be done for the following reasons:
- To accommodate changes in downtimes and periods of peak usage.
- To accommodate changes in the frequency at which content is updated on the servers hosting the content.
To schedule crawls so that:
- Content that is hosted on slower host servers is crawled separately from content that is hosted on faster host servers.
- New content sources are crawled.
- Crawls occur as often as targeted content is updated. For example, you might want to perform daily crawls on repositories that are updated each day and crawl repositories that are updated infrequently less often.
To start a full crawl of a particular content source
- In Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.
- On the Shared Services Administration page, in the Search section, click Search administration.
- On the Search Administration page, on the Quick Launch, in the Crawling section, click Content sources.
On the Manage Content Sources page, point to the content source you want to crawl, click the arrow that appears, and then click Start Full Crawl on the menu that appears.
The value in the Status column changes to Crawling Full for the content source you selected in this step.
Note: The value in the Status column does not automatically change when the crawl is complete. To update the Status column, refresh the Manage Content Sources page by clicking Refresh on the toolbar.