Transform Data with Matillion Data Loader: Streamline Your Data Integration Process

Matillion data loader that results in wise and certain conclusions might mean the difference between lagging and leading the market in the fast-paced world of today. Analytics can be scaled and operated more easily without the hassle of managing infrastructure thanks to Amazon Redshift Serverless.

Without having to bother about scripting or managing infrastructure, you can use Matillion Data Loader to extract data from source systems and load it to Amazon Redshift Serverless. Organizations who require quicker, easier access to insights and analytics and have data flowing in from an expanding number of sources can use Matillion Data Loader.

In this article, we will discover how to use Matillion Data Loader to quickly and easily load data into Amazon Redshift Serverless. We’ll walk through a scenario where Salesforce is the source and Amazon Redshift Serverless is the destination.

With a focus on Amazon Redshift, Matillion is an AWS Data and Analytics Competency Partner. As an AWS Marketplace Seller, Matillion offers cutting-edge, cloud-native data integration solution that addresses the most pressing corporate and individual business concerns. amqid.info will provide some of information for you in this post.

Matillion Data Loader Overview

Matillion data loader
Matillion data loader

A software-as-a-service (SaaS) data loading technology called Matillion Data Loader harvests data from well-known sources and loads it into destinations on cloud data platforms. In order to get you importing data as quickly as possible, it is designed to need little setup and pipeline build time.

Both Amazon Redshift and Amazon Redshift Serverless are compatible with Matillion Data Loader as the destination. At dataloader.matillion.com or through the Matillion Hub, you may sign up and use Matillion Data Loader for free if you’d like to get started right away.

Among Matillion Data Loader’s essential attributes are:

  • No-code pipeline development to increase user empowerment across enterprises and hasten pipeline loading and building.
  • Platform that supports change data capture (CDC) pipelines as well as batch pipelines.
  • Automatic schema drift propagation saves time and reduces the frequency of job failures and delays due to source data schema drift without the requirement for IT support or coding.
  • Integration of pre-built data transformations using Matillion ETL.

Matillion Data Loader provides two methods to load data:

  • Group loading
  • Alter the loading of data collection

Let’s go over the specifics of how to use Matillion Data Loader to load data into Amazon Redshift.

Prerequisites

Matillion data loader
Matillion data loader

The following conditions must be met before using Matillion Data Loader:

  • Registered with Matillion Hub. Read the Matillion Hub overview for additional details.
  • The data source is the Salesforce account.
  • As the location, Amazon Redshift. To configure Amazon Redshift.
  • Credentials for connecting to the source and destination systems with Matillion Data Loader.

Users can build new pipelines using Matillion Data Loader batch loading with the least amount of configuration, complexity, and scripting. At regular intervals, such as once per day or once per hour, data is loaded in batches. In order to analyze and get insights quickly, enterprises can move vast amounts of data onto the cloud thanks to self-service and serverless data loading, which saves time.

Depending on your source data models, incremental batch loading using a high-water mark—like a time, version, or status indicator—to give only updated data after the initial load is also possible.

Data is prepared in Amazon Simple Storage Service (Amazon S3) before being loaded into Amazon Redshift. Before the staged data is published to the final target tables in Amazon Redshift, the data loading procedure automatically inserts audit columns (a batch ID and a timestamp) to aid traceability. After that, Amazon S3 is cleaned of the staged data.

Use Case: Ingest Data from Salesforce by Setting Up a Batch Pipeline

Matillion data loader
Matillion data loader

Let’s examine a real-world example. A company wants to learn more about the personalities that are purchasing its goods so that it can execute more specialized sales and marketing initiatives. They put up a batch pipeline to regularly ingest data from Salesforce’s Account and Contact tables.

Selecting the source data, the destination, and the frequency of batch loading operations are all tasks that may be completed quickly and easily using Matillion Data Loader.

Step 1: Choose Your Data Source, Tables, and Columns

The first step is to choose your data, and Matillion Data Loader links directly to the majority of well-liked data sources.

When you initially log in to Matillion Data Loader, a welcome screen is shown, asking you to click Add pipeline to start building your first pipeline. Use the region selector in the lower right corner of the page to be sure to choose your location. To follow along, consult the documentation’s MDL Pipeline UI.

We want to load data from Salesforce for this example. Either scroll down the list or begin entering “Salesforce” into the search box to narrow your results. Next, choose Salesforce.

To connect to Salesforce, next enter your Open Authorization (OAuth) credentials. With the help of OAuth, a completely secure authentication protocol, you can authorize an application to connect with another on your behalf without disclosing your password.

Click the “OAuth” drop-down and select Add new credential to add a new OAuth credential.To access the Salesforce login page, enter a distinct name for your OAuth now and click Authorize. Press Continue after returning to the Matillion Data Loader page.

The Matillion Data Loader will connect to Salesforce and display a list of tables after you click Continue. Any tables you want to include in the pipeline can be chosen. Let’s select the Contact and Account tables.

If you’d like, you may also opt to sync deleted records, which will remove any data that has been removed from Salesforce in the previous 30 days from the target table.

You can be even more selective while batch loading by deselecting particular columns from your tables (all columns are loaded by default).

Review your dataset by choosing the source table to do this. This opens a window where you can deselect particular columns. For instance, you can deselect the columns that contain personally identifiable information (PII), such as name and phone number, if you want to omit them.

Click Continue after you are finished with this step to proceed to choose your data destination.

Step 2: Choose Your Destination

You can use either Amazon Redshift Serverless or Amazon Redshift as your data destination in the second step. Set up Amazon Redshift is the Matillion documentation to follow along with.

To allow Matillion’s IP addresses, examine database information, and have privileges that allow you to create users and grant privileges, you must have an active AWS account and instance.After choosing Amazon Redshift, provide your AWS access information and any extra details required to connect to Amazon Redshift.To test your settings and proceed, click Test again.

Step 3: Set the Frequency of Your Batch Runs

Setting the frequency of your batch runs is the last step. You have a choice of at least once every five minutes or once every seven days. Simply choose the frequency you want, then click Create pipeline.
The Matillion Data Loader generates all the necessary code to safely access and extract chosen data from your source, loads the data into your destination, and completes the process. In the three-step procedure described above, connections are verified and validated, providing you a reliable batch data pipeline in a matter of minutes.

The rs_salesforce pipeline can be seen syncing its initial data load in the screenshot below. You will notice that each pipeline is “active” once the initial loads have been finished.

Following a successful pipeline run, take the following actions:

  • Open the Amazon Redshift console after logging into the AWS dashboard.
  • Select Query editor v2 from the navigation menu, then establish a connection to a database in your cluster.
  • Inquire about the Amazon Redshift database.

 

Leave a Comment