I came across the excellent draft of the new EU data quality framework that aims to align stakeholders. While it focuses on medicines regulation and procedures that apply across the European Medicines Regulatory , it definitely applies to larger data governance policies and serves as a great template for setting your own strategy.

Data quality is one of the biggest asks from consumers of the data in large scale implementations. As the scale of the data grows, so does complexity and the plethora of usecases that are now dependent on this data. In a large organization, there are typically multiple degrees of separation i.e. siloes between producers of the data and consumers. While the technology to detect and provide feedback loops to data anomalies exist, the fundamental problem of minimizing the value of clean data at the source persists.

A downstream data engineering team stitching together various sources of data is caught between a rock and a hard place i.e. between ever ravenous consumers of data/dashboards/models versus producers who throw over application data over the fence ensuring operational needs are met while downplaying analytics needs.  This also leads to minimal understanding of the data downstream by teams building and using these data products based on resourcing as the focus is on throwing data into the data lake hoarding data. The lack of understanding  is  further compounded by an ever increasing backlog of balancing the tight wire between new data products and tech debt.

This is where organizational structures play a huge part in determining the right business outcome and usage of the data. The right org structure would enforce data products conceptualized and built at source with outcomes and KPIs planned at the outset with room for evolution. 

Some of the better implementations I have been fortunate to be involved in treated data as a product where the data consumption has evolved though deliberation of usecases with KPIs defined at the outset in terms of value of this data as opposed to ‘shovel TBs into the lake’ and then we figure out what to do with it.

Call it what you want ( data mesh etc) , but fundamentally the data-driven approach to plan data usage and governance with the right executive sponsorship holds a ton of value especially in the age of massive data generation across connected systems. The advantage of being KPI-driven at the outset means you can set taxonomy at the beginning, the taxonomy helps feed the domain-based usecase which has implications for consumption of the data and sets the foundation for a holistic permissioning matrix, least privileges clearly implemented and policy-based data access. More to come on this subject but great to see a well defined manifesto defined by policy-makers at the highest levels.