Operationalizing Column-Name Contracts with dbtplyr

Emily Riederer
Senior Analytics Manager, Capital One

07 December 2021, 06:25 PM

Reserve your spot to attend Coalesce

By submitting your email you agree to the Terms of Service and Privacy Statement
Register

Categories covered by this talk

About this talk

Complex software systems make performance guarantees through documentation and unit tests, and they communicate these to users with conscientious interface design. However, published data tables exist in a gray area; they are static enough not to be considered a “service” or “software”, yet too raw to earn attentive user interface design. This ambiguity creates a disconnect between data producers and consumers and poses a risk for analytical correctness and reproducibility.

In this talk, I will explain how controlled vocabularies can be used to form contracts between data producers and data consumers. Explicitly embedding meaning in each component of variable names is a low-tech and low-friction approach that builds a shared understanding of how each field in the dataset is intended to work.

Doing so can offload the burden of data producers by facilitating automated data validation and metadata management. At the same time, data consumers benefit from a reduction in the cognitive load to remember names, a deeper understanding of variable encoding, and opportunities to more efficiently analyze the resulting dataset.

After discussing the theory of controlled vocabulary column-naming and related workflows, I will illustrate these ideas with a demonstration of the {dbtplyr} dbt package which helps analytics engineers get the most value from controlled vocabularies by making it easier to effectively exploit column naming structures while coding.

Join the chat in the #coalesce-column-name-contracts channel (https://bit.ly/2Yf09HX). If you’re not yet a member of dbt Community Slack, sign up at https://www.getdbt.com/community/join-the-community

Emily Riederer

Emily is an experienced builder of sustainable data products throughout the analytics stack.

Sponsor this Event

Your logo could go here!

If you'd like to get your brand in front of attendees contact us.