Extract, Transform, Load jobs, or ETLs, are a way to extract data from one source, then transform and load it into Interaction Studio by way of the SFTP site. ETL data feeds have access to update the catalog, import and export segments, and load historical data.
ETLs require an input file for record extraction. This file contains entries such as users, products, subscription list members, promotions, orders, and more. These files must be in a CSV format and can be encrypted or compressed. They must follow the explicit specifications outlined in the articles in the CSV Feed Requirements section to be processed.
- ETL - is a type of data integration that includes three steps (extract, transform, load) used to blend data from multiple sources. It's often used to build a data warehouse.
- CSV - a Comma Separated Values (CSV) file is a plain text file containing a list of data used to exchange data between systems
- Feed - is a way to send structured, current, and up-to-date information from one system to another. Feeds are the pathway for sending files. Once a feed is configured, you'll be able to view and manage it from the Interaction Studio Feeds Dashboard
- File - this is the data sent during the ETL job. Files are sent by feeds. After a file has been sent, you'll be able to view changed records and load the changes to Interaction Studio directly on the Feeds Dashboard
- SFTP - is a file protocol for transferring large files securely over the web using encryption and authentication
- Commit - overwrite existing data in Interaction Studio with new data in a file. This is done automatically with a feed, but must be done manually with a manual file upload
Overview of ETL Process
At a high level this is how the ETL process works for automatic uploads:
- You have data you want to get into Interaction Studio. For example:
- You want to let Interaction Studio know about all of the people you know about in your various systems
- You want to upload your catalog to Interaction Studio
- You extract data from your systems and create a file in the format outlined in the CSV Feed Requirements article in this knowledge base, directed toward the appropriate feed
- While not required, you can test your CSV file using the instruction in the Test a File section in this article
- Once you are confident that your file is error free and in the right format required by Interaction Studio, you create an automatic job to upload the file generated into the Inbound folder of your configured Interaction Studio SFTP location
- You activate the feed so it will continue to automatically pull files from the SFTP site
- Your job runs automatically and it moves from Inbound to Processing to Processed
- You can monitor behavior on the Feeds Dashboard
If you upload a file manually from the Feeds Dashboard, which you may do periodically to confirm CSV format, if you don't have a frequent amount of data to add to Interaction Studio, or if you do not want to use an automatic feed to an SFTP site.
- Steps 1 and 2 above
- After your file has finished processing, it is available in the Testing folder
- You can review file changes on the Feeds Dashboard before committing those changes to Interaction Studio
Types of ETL Jobs
- ETL jobs follow a predictable workflow that is dependent on an input file for processing. There are two different ways ETL jobs can be processed: Testing (dryRun) and Live (automated).
- Automated jobs are kicked off when a new file has been detected in the dataset's virtual filesystem based on the file naming convention specified for the corresponding job.
- Automated jobs can be reviewed after they have completed, all changes noted in the job will be committed to the dataset.
- It is also possible to manually Test an ETL job. This is considered a 'dryRun'. The dryRun ETL jobs will complete every phase except for the Commit. The staged data can be reviewed before either enabling the ETL feed for automation or committing the test manually to the dataset.
The results of each instance of the ETL can be viewed in the Feeds Dashboard screen. Individual runs are available with progess, run time, and success and error rates of the job. Users can also Review Execution to view the instance itself and review any errors and see some of the sample rows. This dashboard will show the jobs for the last 30 days.