Information Security Analytics and Reporting
The customer, a Fortune 50 financial institution, created a pipeline that aggregates batched data into a secured on-premise cluster to create daily aggregates and reports. The current system performed multiple transformations, which created new datasets. The customer faced multiple issues:
- The data pipeline was inefficient, took 6 hours to run, and required manual intervention almost on a daily basis
- Reports were not aligning correctly with day boundaries.
- Any points of failure require reconfiguring and restarting the pipeline, a time-consuming and frustrating task.
- Major setup and development time was needed to add new sources.
- The team was not able to test and validate the pipeline prior to deployment.
As a result, testing was conducted directly on the cluster, which is an inappropriate use of resources.
CDAP value proposition(s)
The customer’s data development team created independent parallel pipelines that moved the data from SQL Servers into their Hadoop based data lake.
Transformations were performed in-flight with the ability to handle error records.
After completing the initial load, another pipeline fed the data into an aggregation and reporting pipeline.