Jeff Prom's SQL Server Blog

Sharing knowledge and tips on SQL Server, Business Intelligence and Data Management.

Cleansing Data Through SSIS with DQS

Posted by jeffprom on January 13, 2015

While Data Quality Services (DQS) takes some time to build up an effective knowledge base through manual data review, there are times when it sure would be handy to use SSIS. This is only recommended when you need quick results and are feeling pretty good about your knowledge base or are perhaps using an external knowledge base through reference data. Should you decide to cleanse data through SSIS using DQS, it’s important to note that it can export some quality control metrics such as Status and Confidence score with the final data which can later be used to review the results and make final decisions. So let’s take a look at using the DQS Cleansing transform in SSIS.

In this example, I created a basic SSIS package with one Data Flow. I selected an Excel Source and pointed it to a file that has some data to be cleansed. In my example I am going to send in some bad addresses and attempt to cleanse them using Melissa Data. To learn more about the Melissa Data Address Check add-in and reference data in general, view my other blog post here.

Create a new package, add a Data Flow Task, and then add a DQS Cleansing transform object.

dqs cleansing transform setup

Click on the New button next to the Data quality connection manager to setup a new connection to an instance of DQS and then select a knowledge base.

dqs cleansing transform setup with connection

Click on the mappings tab. Here you will map input columns from your data source to the domains on the knowledge base.

dqs cleansing transform with mappings

If you would like to see additional columns on the final results such as the Confidence score and the Reason then click on the Advanced tab.

dql cleansing transform advanced tab

When finished, click OK and test it out. In my example I am just going to a Union All object as my destination for demo purposes but I can still see the resulting cleansed data through the Data Viewer.

dqs cleansing results from transform

As you can see from the results, two addresses were corrected, two had suggestions, and one failed to be cleansed. We could get fancy and split up the results based on the Status values. Corrected ones may be considered ready to ship. Auto Suggest records may want to be reviewed, and the New ones would need to be reviewed since it was not found in the knowledge base.

Using the DQS Cleansing transform in SSIS is useful but doesn’t eliminate the need to review the final cleansed results. Cleansing data through SSIS is dependent on the quality of the underlying knowledge base but does provide a quick and efficient way to do an initial data cleanse. The resulting cleansed data can then be evaluated using the Status, Reason and Confidence score values. Once the data has been processed and reviewed, only then should it be considered cleansed and ‘ready for use’.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: