In these times, with increasing frequency, it is necessary to know the trace ability of the data. The reasons that push companies to have this need when planning the management of data in data warehouse projects are not only prevention or to make easy troubleshooting simplify its correction, but also have much to do with the legal regulations that begin to proliferate in this environment, and require control and the traceability of the data, as is If Solvency II in the insurance sector.
In general, we can say that manage the traceability of the data (which is also possible to refer as lineage, Data Lineage in English), is the ability to know all the life cycle of a data, from the exact date and time it was removed, the time transformation occurred, and until the moment held its load from a source environment (server, file, table field, etc.) to another destination.
Lineage or manually generate traceability is possible but complex. Basically, data movement processes should generate audit data, storing it in a repository. This storage area must be linked to each data. As you can imagine, the effort to carry out a task of this type is incredibly high, with the consideration that, in addition, often involving errors by manual condition.
The best options to achieve the desired trace ability
Those considering what is the best choice for traceability must know that, today, there are several software vendors that provide traceability systems linked to the ETL and Business Intelligence tools environments. If you want to get a good technological bet, should guide the choice to adopt a type of software that is able to provide the completeness of the cycle, namely:
A database engine, high-performance data warehouse creation.
An integration platform that incorporates:
- Features of Data Quality.
- MDM functions.
- Ability to manage metadata.
- Ability to manage traceability data.
- Design tools, reverse capabilities re-engineering, to model and build.