Configure the Product to Crawl an Entire SharePoint Online Tenancy
Typically SharePoint environments are crawled on a per site collection basis. Sometimes however there is a need to crawl an entire SharePoint Online tenancy. The following guide details the step-by-step instructions in order to configure a whole tenancy for collection.
-
Add SharePoint Online source as described in the SharePoint Online section.
NOTE: If this option is not available within the source type selection then it would suggest that the source type is not currently licensed, please contact support for more details.
-
The Source is configured to the tenancy level, therefore we recommend specifying the URL as the root site collection URL. This is however not a requirement if you do not have a root site collection.
-
Specify an account with tenancy administration rights. Accounts can be specified in either the default AD format DOMAIN\USERNAME, or in the format of the user's email address USERNAME@DOMAIN.
-
The Match Rules are an important configuration option, defining which site collections will be crawled. Here are some example match rules that may be required:
.*\/Personal\/.*
—Identifying "/personal/" within the URL (as per the below example) - this would be the correct configuration to crawl end-user's OneDrive site collections (OneDrive for Business).*
— Identifies any site collections, ensuring that all collections will be crawled
-
Define the required Classification Template, as well as the Detection Period which defines how often we will detect new site collections
-
Select Save.