Friday, June 4, 2010

What we can do using SSIS..

Integration Services is a platform for building high performance data integration and workflow solutions, including extraction, transformation, and loading (ETL) operations for data warehousing.

Integration Services includes graphical tools and wizards for
building and debugging packages;
tasks for performing workflow functions such as
1. FTP operations,
2. SQL statement execution.
3. E-Mail messaging.
4. Data sources and destinations for extracting and loading data.
5. Transformations for cleaning.
6. Aggregating.
7. Merging.
8. Copying data.
9. Management service (like Backup & restore using Maintaince Plan)
10. Administering Integration Services packages.

Typical uses of SSIS packages:

Merging Data from Heterogeneous Data Stores:Data is typically stored in many different data storage systems, and extracting data from all sources and merging the data into a single, consistent dataset is challenging. This situation can occur for a number of reasons. For example:

Many organizations archive information that is stored in legacy data storage systems. This data may not be important to daily operations, but it may be valuable for trend analysis that requires data collected over a long period of time.
Branches of an organization may use different data storage technologies to store the operational data. The package may need to extract data from spreadsheets as well as relational databases before it can merge the data.

Data may be stored in databases that use different schemas for the same data. The package may need to change the data type of a column or combine data from multiple columns into one column before it can merge the data.

Integration Services can connect to a wide variety of data sources, including multiple sources in a single package. A package can connect to relational databases by using .NET and OLE DB providers, and to many legacy databases by using ODBC drivers. It can also connect to flat files, Excel files, and Analysis Services projects.

Integration Services includes source components that perform the work of extracting data from flat files, Excel spreadsheets, XML documents, and tables and views in relational databases from the data source to which the package connects.

Next, the data is typically transformed by using the transformations that Integration Services includes. After the data is transformed to compatible formats, it can be merged physically into one dataset.

After the data is merged successfully and transformations are applied to data, the data is usually loaded into one or more destinations. Integration Services includes destination for loading data into flat files, raw files, and relational databases. The data can also be loaded into an in-memory recordset and accessed by other package elements.

Populating Data Warehouses and Data Marts :The data in data warehouses and data marts is usually updated frequently, and the data loads are typically very large.

SSIS includes a task that bulk loads data directly from a flat file into SQL Server tables and views, and a destination component that bulk loads data into a SQL Server database as the last step in a data transformation process.

An SSIS package can be configured to be restartable. This means you can rerun the package from a predetermined checkpoint, either a task or container in the package. The ability to restart a package can save a lot of time, especially if the package processes data from a large number of sources.

You can use SSIS packages to load the dimension and fact tables in the database. If the source data for a dimension table is stored in multiple data sources, the package can merge the data into one dataset and load the dimension table in a single process, instead of using a separate process for each data source.

Updating data in data warehouses and data marts can be complex, because both types of data stores typically include slowly changing dimensions that can be difficult to manage through a data transformation process. The Slowly Changing Dimension Wizard automates support for slowly changing dimensions by dynamically creating the SQL statements that insert and update records, update related records, and add new columns to tables.

Additionally, tasks and transformations in Integration Services packages can process Analysis Services cubes and dimensions. When the package updates tables in the database that a cube is built on, you can use Integration Services tasks and transformations to automatically process the cube and to process dimensions as well. Processing the cubes and dimensions automatically helps keep the data current for users in both environments; users who access information in the cubes and dimensions, and users who access data in a relational database.

Integration Services can also compute functions before the data is loaded into its destination. If your data warehouses and data marts store aggregated information, the SSIS package can compute functions such as SUM, AVERAGE, and COUNT. An SSIS transformation can also pivot relational data and transform it into a less-normalized format that is more compatible with the table structure in the data warehouse.

Cleaning and Standardizing Data:Whether data is loaded into an online transaction processing (OLTP) or online analytic processing (OLAP) database, an Excel spreadsheet, or a file, it needs to be cleaned and standardized before it is loaded. Data may need to be updated for the following reasons:

Data is contributed from multiple branches of an organization, each using different conventions and standards. Before the data can be used, it may need to be formatted differently. For example, you may need to combine the first name and the last name into one column.

Data is rented or purchased. Before it can be used, the data may need to be standardized and cleaned to meet business standards. For example, an organization wants to verify that all the records use the same set of state abbreviations or the same set of product names.

Data is locale-specific. For example, the data may use varied date/time and numeric formats. If data from different locales is merged, it must be converted to one locale before it is loaded to avoid corruption of data.

Integration Services includes built-in transformations that you can add to packages to clean and standardize data, change the case of data, convert data to a different type or format, or create new column values based on expressions. For example, the package could concatenate first and last name columns into a single full name column, and then change the characters to uppercase.

An Integration Services package can also clean data by replacing the values in columns with values from a reference table, using either an exact lookup or fuzzy lookup to locate values in a reference table. Frequently, a package applies the exact lookup first, and if the lookup fails, it applies the fuzzy lookup. For example, the package first attempts to look up a product name in the reference table by using the primary key value of the product. When this search fails to return the product name, the package attempts the search again, this time using fuzzy matching on the product name.

Another transformation cleans data by grouping values in a dataset that are similar. This is useful for identifying records that may be duplicates and therefore should not be inserted into your database without further evaluation. For example, by comparing addresses in customer records you may identify a number of duplicate customers.

Building Business Intelligence into a Data Transformation Process:
A data transformation process requires built-in logic to respond dynamically to the data it accesses and processes.

The data may need to be summarized, converted, and distributed based on data values. The process may even need to reject data, based on an assessment of column values.

To address this requirement, the logic in the SSIS package may need to perform the following types of tasks:

>>>Merging data from multiple data sources.
>>>Evaluating data and applying data conversions.
>>>Splitting a dataset into multiple datasets based on data values.
>>>Applying different aggregations to different subsets of a dataset.
>>>Loading subsets of the data into different or multiple destinations.

Integration Services provides containers, tasks, and transformations for building business intelligence into SSIS packages.

Containers support the repetition of workflows by enumerating across files or objects and by evaluating expressions. A package can evaluate data and repeat workflows based on results. For example, if the date is in the current month, the package performs one set of tasks; if not, the package performs an alternative set of tasks.

Tasks that use input parameters can also build business intelligence into packages. For example, the value of an input parameter can filter the data that a task retrieves.

Transformations can evaluate expressions and then, based on the results, send rows in a dataset to different destinations. After the data is divided, the package can apply different transformations to each subset of the dataset. For example, an expression can evaluate a date column, add the sales data for the appropriate period, and then store only the summary information.

It is also possible to send a data set to multiple destinations, and then apply different sets of transformation to the same data. For example, one set of transformations can summarize the data, while another set of transformations expands the data by looking up values in reference tables and adding data from other sources.

Automating Administrative Functions and Data Loading:

Administrators frequently want to automate administrative functions such as backing up and restoring databases, copying SQL Server databases and the objects they contain, copying SQL Server objects, and loading data. Integration Services packages can perform these functions.

Integration Services includes tasks that are specifically designed to copy SQL Server database objects such as tables, views, and stored procedures; copy SQL Server objects such as databases, logins, and statistics; and add, change, and delete SQL Server objects and data by using Transact-SQL statements.

Administration of an OLTP or OLAP database environment frequently includes the loading of data. Integration Services includes several tasks that facilitate the bulk loading of data. You can use a task to load data from text files directly into SQL Server tables and views, or you can use a destination component to load data into SQL Server tables and views after applying transformations to the column data.

An Integration Services package can run other packages. A data transformation solution that includes many administrative functions can be separated into multiple packages so that managing and reusing the packages is easier.

If you need to perform the same administrative functions on different servers, you can use packages. A package can use looping to enumerate across the servers and perform the same functions on multiple computers. To support administration of SQL Server, Integration Services provides an enumerator that iterates across SQL Management Objects (SMO) objects. For example, a package can use the SMO enumerator to perform the same administrative functions on every job in the Jobs collection of a SQL Server installation.

SSIS packages can also be scheduled using SQL Server Agent Jobs.

1 comment:

Sql Novice said...

Randhir,

I have a question and I'm new to SSIS and also, have no idea on VB.net but I sure can pick it up pretty quick! My question is that I currently have some files in a folder that I need to be loaded to SQL Server, with no transformations. All these different files need to go into different SQL Tables and I use an ODBC Connection. The data flow task simply is just a source and one destination each for the table and an error output(just in case). Now, all these source files are in a single folder and all the destination tables are the same db same schema. I was wondering if there is a way to create JUST ONE data flow task, use a for each loop to load all the different tables from the different sources. I wonder if it can be done at all and if I can does it have to be a VB script and a script task?