On a day to day organizations are automating processes and recording data separately, but facing the analytic world is necessary to acquire a single view. This is where comes in data integration, one of the keys to the data management Data Warehouse projects , which does correlate the different areas and processes of the company as a whole, to provide the analysis of the essential unification of transmitting criteria of consistency and completeness.
To achieve integrate data source in the Data Warehouse, it is necessary to act at two levels:
- Defining the cycle data transition.
- Setting the refresh rate of each.
ETL processes in the Data Warehouse
The first step to achieve this is by defining the extraction process. In this case, there are often two common practices: invasive and abstraction, although it is more advisable to use the second, especially if responsibility is appealed . Also, keep in mind that the world of transactional systems is very dynamic and invasive method may not give the expected results if there have been changes.
Invasive practice, consisting raise extraction processes attacking directly to data sources and proceeding to the next step (transformation) in the same operation, has the disadvantage of causing the whole process remains too tied to structures origin system. The disadvantage is that if the structure changes, as often happens, for example, to update your version; a direct impact occurs in the process.
Abstract practice, which Lantares recommended, consists of generating a independence through files. This implies that, from the source systems, the generation of data files will occur (which can be plain text files, csv, or xml), which are placed in a shared for further processing and loading resource.
These practices provides assurance that, if the source system changes of structure, an update or change of ERP, for example; this does not cause direct impact on the integration processes and the structure of the data warehouse, and only have to ensure that files maintain the same structure.
Once the extraction phase has been successfully completed, it starts the process of transformation. At this stage, the source data changes are made to ensure completion adapting to destination DWH.
There are many ways to do this and some of the most common are:
Applying filters: for example by entering the command to collect more data records only zero.
Changing the format: the way to do it would be transforming the American format in which the dates have been collected at source (M / D / A) to establish Europe (D / M / A) as definitive for all systems.
Conducting aggregations: to allow, although the data are collected on a daily basis, they can be extracted from DWH as aggregates for months.
Establishing connections: for example, by collecting sales different stores in DWH, they are dumped in one fact table.
Arranging transformations: to enable normalization client code, for example, 4 digits.
If the transformation has been carried out correctly and as shown by the test cycles, can begin the loading phase, consisting dump the processed data to the end Datamarts structures that make up the DWH. It is important to define loading strategy and for this we must choose whether to take place so:
Total: all content is deleted and all the historical full recharge.
Incremental: only new products are loaded.
Not long ago, the design process was carried out on paper, and later proceed to program SQL statements. With the emergence of ETL (Extract Transformation Load tools ), you can perform both design and construction, in a single operation.
ETL is the best alternative to encode SQL. SQL (Standard Query Language) is a programming language used to query database. Its most distinctive feature is that is done manually so may contain errors and it can also lead to mistakes … ETL is the option as it is automated, in fact the way enabling them to build the integration processes It is simply dragging icons (stages) representing a function. This type of design involves numerous benefits such as:
- Improve the architecture and reduce maintenance costs.
- Reuse and procedures professional profiles.
- Managing the Meta Data.
- To focus on the business, rather than focus on programming interfaces.
- Building management within applications.
- Make security.
- Access management systems and event programming.
- Reduce development time significantly.