How to connect an external website with Data Cloud is a critical skill for unlocking the full potential of your public-facing content. In the age of AI, your website is a goldmine of unstructured data. This guide provides a comprehensive, step-by-step process to configure this connection using the Web Content Crawler.
By following these steps, you will learn how to ingest website data to create a robust knowledge base for Einstein Copilot, enhance customer profiles with rich content, and power semantic search capabilities, transforming your public content into a strategic asset for Salesforce AI.
Step 1: Identify Your Target Website URL
First, choose the public website you want to connect. The URL must be accessible without a login. For this example, we will use the Salesforce Trailhead website.
- Example URL
:https://trailhead.salesforce.com/

Step 2: Create the Web Content Crawler Connector in Data Cloud
Log in to your Data Cloud org to set up the connection. This Data Cloud connector is specifically designed for ingesting website data.
a) Go to Data Cloud Setup.

b) In the Quick Find box, type Other Connectors and select it and Click New..

c) Select the Web Content Crawler from the list and click Next.

Step 3: Configure and Save Your Website Connection
Now, you will configure the specific details for your website connection.
a) Connection Name: Provide a unique and descriptive name, like “Trailhead Website Connection “.
b) URL: Paste the website URL from Step 1.
c) User Agent: Leave this field as the default. It is auto-populated to identify the crawler to the website (e.g., as Mozilla, Safari, etc.).

d) Click Test Connection. A success message will appear if Data Cloud can reach the URL. e) Click Save.

Step 4: Create an Unstructured Data Lake Object (DLO) to Store Your Website Content
Next, create a Data Lake Object (DLO). This object acts as a container to receive and store the unstructured data crawled from your website.
a) From the App Launcher, go to Data Cloud, then open the Data Lake Objects tab.
b) Click New to start the DLO creation wizard.
c) To choose a method for creating your data lake object, select From External Files, and click Next.

d) Choose the Web Content (Crawler) connector, and click Next.

e) From the Connection dropdown, select the name you provided in Step 3 (e.g., Trailhead Website Connection). Then select the crawl depth for the website in the level of 0 to 4 .

This critical setting determines how many levels of links the crawler will follow from your starting URL. Choose a level from 0 to 4.
Step 5: Name the DLO and Assign a Data Space
Finalize the details for your new Data Lake Object.
a) Object Label & API Name: Give your DLO a clear name, such as Trailhead Web Content.
b) Data Space: Select the appropriate Data Space for this data. c) Click Next.

Step 6: Review, Save, and Verify the Data Model Objects
Data Cloud automatically creates three Data Model Objects (DMOs) to process your unstructured web content. This flow is designed to make the text usable for AI applications.
Here is what each object does:
- Transcribe: Scrapes and extracts the raw text from each webpage.
- Chunk: Breaks the extracted text into smaller, coherent sections. This is vital for AI models to understand the context.
- Index: Stores these chunks in a vector database, making them searchable via semantic search.
This entire process is part of preparing a knowledge base for features like Einstein Copilot.
After reviewing the auto-created DMOs, click Save. To verify, navigate to the Data Lake Objects tab and check that your new DLO has a “Success” status and is ready to use in your Data Cloud.

You have successfully learned how to connect an external website with Data Cloud. Your web content is now ready to be ingested and used.

Discover more from Salesforce Hours
Subscribe to get the latest posts sent to your email.
Can I access any website for example abc.com website
Hi,
Could you please clarify if you are asking whether it is possible to access any arbitrary external website, such as ABC.com, using the Web Content Crawler or other connectors in Salesforce Data Cloud?
Thanks, Rajesh Bandaru