ETL ( Extract, transform and load) determineshow and when the source data moves. While extracting and loading is simplymoving the data, the process of transforming the data is more complicated. Thebusiness insights are provided by how well the target databases give away theinformation and how meaningful they are. How well these database perform arethe direct result of how well the ETL has performed on the data sources. If theETL system loads the data in unrecognised format , there will be no businessinsights and thus it is useless to user. Thus ETL processes are very criticalin harvesting the data from the big data groups and effectively deal with highvolumes.
These extracted data then havepl to be presented in a sensible andmeaningful format. The extraction should ensure that allrequired data are extracted with minimal resources and minimal load to thesource. The extraction could be in terms of updating the data or full extractof the data.
- Thesis Statement
- Structure and Outline
- Voice and Grammar
The full extraction of data would mean transfer of huge data inseveral gigabytes or so. After the data is extracted, thepresentation of the data is another challenge. Transforming the data needs toensure that the data are in same dimension along with same units so that theycan be combined with other sources . These data will need to undergo joiningfrom all data sources and validation. During the load process, similarly asthe transformation and extraction, the referral integrity is theimportant factor to consider to ensure the consistency of the ETL process. Even though the designing ETL processis straight forward, there could be several limiting factors such as missingextracts, or null values in reference databases , error in connection orsimpler situation like power outage.
Without ETL process, there is no reliableprocess to use data from several databases and thus ETL process has to bedesigned failsafe. Problems that may be encounteredwhile performing ETL within sunshine group Sunshine group being amultinational business operating in three different countries across the globe,the problems are huge. Firstly the secured transmission of data which could beat different physical locations. Secondly, the data format, if differentlocations use data in different metrics, their extraction and presentation mightbe even more complicated. The financial burden of managing different datacentres and implementation of ETL across the system is even more challenging.
During extraction, the same level ofintegrity may be hard to maintain in all data sources. Likely duringtransformations, the requirement priority may be different depending on wherethe data is accessed at. The challenge also lies in getting all data inthe same dimension and units.
While loading data, the loading process mightneed more resources as different sources are operating