How to connect an external website with Data Cloud is a critical skill for unlocking the full potential of your public-facing content. In the age of AI, your website is a goldmine of unstructured data. This guide provides a comprehensive, step-by-step process to configure this connection using the Web Content Crawler.

By following these steps, you will learn how to ingest website data to create a robust knowledge base for Einstein Copilot, enhance customer profiles with rich content, and power semantic search capabilities, transforming your public content into a strategic asset for Salesforce AI.

Step 1: Identify Your Target Website URL

First, choose the public website you want to connect. The URL must be accessible without a login. For this example, we will use the Salesforce Trailhead website.

  • Example URL : https://trailhead.salesforce.com/
How to connect an external website with Data Cloud


Step 2: Create the Web Content Crawler Connector in Data Cloud

Log in to your Data Cloud org to set up the connection. This Data Cloud connector is specifically designed for ingesting website data.

a) Go to Data Cloud Setup.


b) In the Quick Find box, type Other Connectors and select it and Click New..

c) Select the Web Content Crawler from the list and click Next.


Step 3: Configure and Save Your Website Connection

Now, you will configure the specific details for your website connection.

a) Connection Name: Provide a unique and descriptive name, like “Trailhead Website Connection “.
b) URL: Paste the website URL from Step 1.
c) User Agent: Leave this field as the default. It is auto-populated to identify the crawler to the website (e.g., as Mozilla, Safari, etc.).


d) Click Test Connection. A success message will appear if Data Cloud can reach the URL. e) Click Save.


Step 4: Create an Unstructured Data Lake Object (DLO) to Store Your Website Content

Next, create a Data Lake Object (DLO). This object acts as a container to receive and store the unstructured data crawled from your website.

a) From the App Launcher, go to Data Cloud, then open the Data Lake Objects tab.
b) Click New to start the DLO creation wizard.
c) To choose a method for creating your data lake object, select From External Files, and click Next.


d) Choose the Web Content (Crawler) connector, and click Next.



e) From the Connection dropdown, select the name you provided in Step 3 (e.g., Trailhead Website Connection). Then select the crawl depth for the website in the level of 0 to 4 .


This critical setting determines how many levels of links the crawler will follow from your starting URL. Choose a level from 0 to 4.

Step 5: Name the DLO and Assign a Data Space

Finalize the details for your new Data Lake Object.

a) Object Label & API Name: Give your DLO a clear name, such as Trailhead Web Content.

b) Data Space: Select the appropriate Data Space for this data. c) Click Next.


Step 6: Review, Save, and Verify the Data Model Objects

Data Cloud automatically creates three Data Model Objects (DMOs) to process your unstructured web content. This flow is designed to make the text usable for AI applications.

Here is what each object does:

  • Transcribe: Scrapes and extracts the raw text from each webpage.
  • Chunk: Breaks the extracted text into smaller, coherent sections. This is vital for AI models to understand the context.
  • Index: Stores these chunks in a vector database, making them searchable via semantic search.

This entire process is part of preparing a knowledge base for features like Einstein Copilot.

After reviewing the auto-created DMOs, click Save. To verify, navigate to the Data Lake Objects tab and check that your new DLO has a “Success” status and is ready to use in your Data Cloud.

How to Connect an External Website to Salesforce Data Cloud


You have successfully learned how to connect an external website with Data Cloud. Your web content is now ready to be ingested and used.


Author

  • Salesforce Hours

    Salesforcehour is a platform built on a simple idea: "The best way to grow is to learn together". We request seasoned professionals from across the globe to share their hard-won expertise, giving you the in-depth tutorials and practical insights needed to accelerate your journey. Our mission is to empower you to solve complex challenges and become an invaluable member of the Ohana.


Discover more from Salesforce Hours

Subscribe to get the latest posts sent to your email.

2 thoughts on “How to connect an external website with Data Cloud ?”

  1. Hi,

    Could you please clarify if you are asking whether it is possible to access any arbitrary external website, such as ABC.com, using the Web Content Crawler or other connectors in Salesforce Data Cloud?

    Thanks, Rajesh Bandaru

    Reply

Leave a Reply

Discover more from Salesforce Hours

Subscribe now to keep reading and get access to the full archive.

Continue reading