Data Insights Framework

Matured organizations take business decisions based on the data to remain competitive. Thus, ensuring the timely availability of quality data is very critical for accurate reporting and analysis processes. “Data Insights Framework” Measures the Data Quality metrics and presents the Standard DQ metrics in PowerBi dashboards. Also, this framework provides APIs to gain insights from the data, to detect anomalies by comparing multiple snapshots of data and to run common statistical algorithms like clustering, time series forecasting and outlier detection.

Business Problem

It is often witnessed that data coming from external / upstream systems might be stale or may contain anomalies. Standard ETL processes that load data “as it comes” lead to potential errors in reporting and may lead to wrong business decisions. Often the discrepancies are identified late in the ETL cycle at the time of reporting and addressing these discrepancies will increase the time for ETL pipeline execution.

Solution

Our Data Insights Framework has below features to ensure the quality of the data and in gaining insights from the data:

  • Support for wide variety of Upstream Data Sources: ADF v2 Copy activity to support transferring multiple varieties of data sources (files, Structured & non-structured data sources.
  • Policy Control: Covers all applicable Data Quality Dimensions – Accuracy, Completeness, Consistency, Currency, Integrity, Timeliness and Validity
  • Experimentation & Analysis: Python based ML models to perform Exploratory Data Analysis, Outlier Detection, Anomaly detection by variance, Seasonality and Time Series forecasting
  • Monitoring Portal: Dashboards to view the KPIs and control the Data Quality KPIs, anomalies, pipeline & process monitoring alerts, performance trends & failure analysis

Why the solution is unique?

  • Gives power to business users to experiment on their data to enable predictive analytics.
  • Ability to extract data from multiple sources by leveraging Azure Data factory (ADF)
  • Ability to compare multiple snapshots of data at different points of time to identify the quality journey of the data.
  • Developed completely on Microsoft technologies and capable of deployment in customer’s own Azure subscription to keep the data in customers’ environment.
  • Near real time reporting & dashboards

Benefits

  • Ensures high quality of data for accurate reporting and reliable business decisions
  • Facilitates early identification of data quality issues at upstream level to reduce the ETL cycle time. Potentially can reduce the ETL timelines by 10-20%, thus improving availability.
  • Business users can validate their assumptions quickly by running different pre-defined ML models to find valuable insights from the Data
  • Data analysts can able to understand the data better as the framework profiles data and presents the insights from data.




Architectural Flow Diagram for Data Insights Framework.