Moving massive big data for the VA

1. Challenge: The Department of Veterans Affairs (VA) needed support services for the Business Intelligence Service Line performing (BISL) Extract-Transform-Load (ETL) programming services to the Regional Data Warehouses, Corporate Data Warehouse and Veterans Data Warehouse. It would require moving large amounts and varieties of data — consisting of over 60,000 data fields — between 129 VA sites that store Veterans Health Information Systems and Technology Architecture data.

2. Solution: Datum Software identified ETL requirements and a strategy to aggregate the data into three warehouse tiers (CDW/RDW/VDW). Functional and technical requirements gathering and coordinate sessions were facilitated between clients and BISL technical personnel. Dependent technical stakeholder personnel were also interviewed and coordinated with to ensure that all interfaces and technical rules were captured in relation to interfaces between the dependent systems. Datum Software’s experience, coupled with the high quality of their CMMI Maturity Level 3 capabilities, enabled the team to provide recommendations for improving process, products & documentation, and fully implement recommendations.

The very large scale of the data warehouse posed unique challenges. To design, develop, test and maintain ETL code for an initiative of this sort required immense technical and functional expertise, well beyond that required for an average data warehouse implementation. The ideal hardware for executing the ETL processes was available, so smart, efficient ETL code was written to utilize enhanced SSIS features such as partitioning and bulk loading — drastically improving performance. Datum Software properly documented ETL mapping and workflow processes and provided rapid ramp-up for new members on the team.

3. Results: Datum Software moved data into a large data warehouse with varying levels of Service Level Agreements (SLA) for real-time, daily, weekly and monthly refreshes, dependent upon the data source and requirements. The refresh could be incremental data or full refresh data. The data was also built to load within a very tight time frame. Datum Software transformed and cleaned the data as it moved from source to target – transforming an incredibly complex process into a streamlined solution.