How metadata reduces data sprawl from Coalesce 2023

Etai Mizrahi, Co-Founder & CEO of Secoda, discusses an important new concept called metadata monitoring.

“It is a really important concept–something we all should be thinking about early on.”

Etai also introduces Secoda, a system designed to monitor metadata, and the challenges and prospects of metadata monitoring.

Metadata monitoring is becoming increasingly important for data efficiency and cost management

Etai highlights the increasing need for metadata monitoring in data functions. The rapid proliferation of tools and data, increased costs, and the shift towards consumption-based pricing all necessitate a more efficient and cost-effective approach to data management. Metadata monitoring, which offers a macro-level overview of data stacks, holds the key to addressing these challenges.

Etai asserts, "There's increased pressure on our costs. It puts increased pressure on the amount of data that we're processing, and understanding how this data is moving through our system is more important." He also notes the need to validate the ROI of data teams, stating, "Data teams generally struggle with the idea of ‘How do we validate ROI for our team?’ And when we shift towards consumption-based pricing, when we're seeing tools proliferate the way they do, the idea of validating your ROI as a data team becomes all the more important."

Understanding the different types of metadata when implementing metadata monitoring

Etai broke down the concept of metadata into four types—technical, operational, descriptive, and social—each with its own value and challenges for data teams. He also discusses the concept of monitoring, which involves tools and techniques for data quality control.

He explains, "On a practical level, we think there are four types of metadata that data teams should be thinking about." He adds, "Monitoring is something like dbt tests, Great Expectations, a metadata monitoring tool like Monte Carlo, Metaplane, Bigeye; all of them are able to do this monitoring." Etai also states that metadata monitoring "looks at the macro-level impact of all of your data stack."

Metadata monitoring involves understanding state and process metadata

To implement metadata monitoring, Etai suggests understanding two types of metadata: state and process. State metadata provides an overview of the data stack at a given point, while process metadata focuses on how data interacts with other tools in the stack.

"The first step, we think, is implementing a lineage model that doesn't just look at dbt, Snowflake, and maybe your BI tool, but looking at your entire data stack," he explains. For process metadata, he suggests integrating YAML files or Great Expectation tests into the overall model "so that you can actually take action when you're looking at the lineage graph."

Etai also explains that metadata monitoring tools should provide insights into the efficiency of data teams, leading to leaner and more effective operations. He concludes, "As data teams, tools that measure metadata monitoring can give us the same insight into our efficiency so that hopefully, we can be a little bit more lean, efficient…as we grow."

Etai’s key insights

  • Metadata monitoring comes at an interesting time for data functions and is something that should be considered more
  • The market landscape of data has significantly changed since 2012, with more overlapping categories and common concepts
  • Metadata monitoring can help manage the complexity of data stacks and improve understanding of the landscape
  • Metadata monitoring can help reduce data sprawl and dark data
  • Implementing metadata monitoring at scale is difficult and requires a central source of truth
  • Secoda has a system that allows setting thresholds to see how the data stack is performing over time
Related Articles

Register for Coalesce 2024

Join us in-person or online for the largest analytics engineering conference. Level-up your skillset, expand your network, and build your path at Coalesce 2024.