Skip to main content

Google Drive Source

The Google Drive source configuration screen allows you to enable the crawling and classification of content stored in both G-Suite repositories and Google Drive personal accounts.

IMPORTANT! Make sure you created App for GDrive crawling prior to start adding the source. See Configure G Suite and Google Drive for Crawling for more information.

add_gdrive_source_thumb_0_0

Complete the following fields:

OptionDescription
Basic settings
Drive TypeSelect Business.
User Email(s)When adding a G-Suite source, enter the email address of the user's drive that you wish to crawl (via impersonation).
Crawl Shared ItemsSelect to crawl all files shared with the specified user in addition to any team drives shared with the user.
Crawl Shared ItemsSelect to enable crawling of any types of documents shared with the specified user.
JSON ImportDrag the JSON connection file you downloaded while creating Google service account in the form.
Project IDOpen the JSON connection file and copy file contents to Project ID field.
Write ClassificationsSelect to enable the writing of classifications back to the Google Drive repository. NOTE: Any classifications written to Google Drive are stored in custom properties which are not visible to an end user - they are only accessible via the Google Drive APIs.
OCR Processing ModeSelect documents' images processing mode: - Disabled – documents' images will not be processed. - Default – defaults to the source settings if configuring a path or the global setting if configured on a source. - Normal – images are processed with normal quality settings. - Enhanced – upscale images further to allow more.
Advanced SettingsClick the "wrench" icon in the Settings area (gdrive_advanced_settings) at the bottom of the screen to expand the following advanced settings: - Re-Index Period — specifies how often the source should be checked for changes. The number specifies the period in days. - Priority — specifies the priority of content source processing in the service queues. - Document Type — can be used to specify a value which can be used to restrict queries when utilizing the core search index.
Source GroupNetwrix recommends creating a dedicated source group for Google Drive.
Pause source on creationSelect if you want to make other configuration changes before collection of the source occurs.

Google Drive

This section contains information on how to configure exclusions and use tagging for a Google Drive source.

Configure Tagging

You can instruct the program to write classification attributes back to to the document properties in the Google Drive repository. Each taxonomy can be mapped to a single property.

NOTE: Custom properties are not exposed to end users and are only available to other applications using the API.

By design, Google Drive supports custom properties with the following limitations:

  • Maximum of 100 custom properties per file, totalled from all sources
  • Maximum of 30 public properties per file, totalled from all sources
  • Maximum 124 characters for both the property name and the list of classifications

NOTE: See this article for details.

To overcome these limitations, Google Drive tagging implemented in the solution supports appending a counter to the field name. So, it is possible to split classifications across multiple fields if a text limit is hit within the source system. For example, you may have classifications written to the fields “Agriculture” and “Agriculture_1”.

NOTE: Due to the way Google Drive manages document audit information, writing classifications to a document (i.e. tagging) in this source will affect additional document metadata such as modified date and/or modified user:

  • modified date information will be changed to the time of tagging
  • modified user will be changed to the account that was configured for crawling this source

Related content source settings can be configured at a global level (default), or at a source level.

To configure tagging on a global level

  1. In the management console, click SourcesGoogle Drive, then in the left pane click Write Configuration.
  2. Select the taxonomy you need and click the Edit link for it.
  3. In the taxonomy properties, enable writing classification attributes (tags) and specify other settings:
SettingDescriptionNote
EnabledUse to enable / disables the writing of classifications for the selected taxonomy.Cleared by default
Field NameDefines the attribute name to be used when persisting the classifications (metadata property name).
Single Value FieldIf selected, this option will cause only the highest scoring classification to be written to the field.
Maximum FieldSpecifies the maximum number of properties which can be used to write classifications. Property names will be in the format 'FieldName_X'This allows more classifications to be written for sources where there is a limit on field length, by writing classifications across multiple properties.
FormatHow the classifications should be formatted.You can create a custom delimited combination of the labels / GUIDs.
Name/ID or ClassDepending on the format, take the term labels, IDs or a combination of bothThe corresponding Delimiter must be a string or array type with a maximum length of 3.
Prefix/ SuffixWill be appended to the formatted string of classifications.

googledrivewriteconfiguration_thumb_0_0

Configure Exclusions

In the Collection Exclusions window you can set up the following:

  • List of file locations that will be ignored when indexing files from Google Drive source
  • Excluding conditions based on the metadata of a document

In the management console, click SourcesGoogle Drive, then in the left pane click Collection Exclusions.

  1. Click Filter tab and in the Filter field specify the file locations to exclude from crawling.

gdrive_exclusion_filter_thumb_0_0

  1. Wildcards can be used anywhere in the exclusion pattern definition as follows:
    • The asterisk character (*) - matches any sequence of characters
    • The question mark character (?) - matches any single character

NOTE: Exclusions are case-insensitive.

For example, to exclude all Excel files stored in the corp/Year2020 folder, enter gdrive://corp/Year2020/*.xlsx

  1. To verify exclusion location, enter its path in the Test Path field and click Test.

  2. If needed, you can use metadata conditions to restrict when an exclusion filter should be applied. For that, click Condition tab and click Add. Then select how the exclusion conditions will work: it can check if metadata field of the document has any value, is not specified, or matches a specific metadata value.

    CriteriaCondition
    ComparisonCompare a value in the document metadata field with the value set by condition. With this criteria selected, you will then need to specify: - Field name — document metadata field to check - Comparison — operator to use (for example, "does not contain") - Value — value to compare against For example, to exclude documents tagged with year 2018, set the condition as follows: - Field NameDocYear - Comparisonequals - Value2018
    Has any valueExclude the document if its metadata field has any value. With this criteria selected, specify Field Name.
    Has no valuesExclude the document if metadata field value is not specified. With this criteria selected, specify Field Name.

    gdrive_exclusion_condition_2_thumb_0_0

    When finished, click Add.

  3. To verify the settings, click Test.

  4. Finally, click Save and close the window.

Any item that matches the excluding filter will be ignored.