ETL Terms

Source System 

Source System is a database, application, file, or other storage facility from which the data in a data warehouse is derived. Some of them are Flat files, Oracle Tables, Microsoft SQL server tables, COBOL Sources, XML files. 

Mapping 

Mapping is the definition of the relationship and data flow between source and target objects. It is a pictorial representation about the flow of data from source to target. 

Metadata 

Metadata describes data and other structures, such as objects, business rules, and processes. For  example, the schema design of a data warehouse is typically stored in a repository as metadata,  which is used to generate scripts used to build and populate the data warehouse. Metadata contains all the information about the source tables, target tables, the transformations, so that it will be useful and easy to perform transformations during the ETL process. A repository contains metadata. 

Staging Area 

Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data, and perform data cleansing and merging, before loading the data into warehouse. Informatica 
 

Data Cleansing 

It is the process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process. The data cleansing technology improves data quality by validating, correctly naming and standardizing data. For example, a person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse). 

Transformation 

Transformation is the process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources. A transformation is a repository object that generates, modifies, or passes data. Target System  Target System is a database, application, file, or other storage facility to which the "transformed  source data" is loaded in a data warehouse. 

1 comment :

  1. I think that is actually a very nice and smart explanation about Informatica installation and its other useful aspects.

    informatica read rest api

    ReplyDelete