This talk is about how modern data teams working in the “data as product” and “analytics engineering” models can benefit from using prototyping practices common to product and engineering functions — and why it can be hard for them to do so.
The scope of work for modern data teams is expanding in interesting and exciting ways as teams gain independence through executive sponsorship and control over their own infrastructure with tools like Looker and dbt. However, this means that the cost of project misalignment has grown accordingly. Data teams now have to mitigate the risk of investing time into dbt models, documentation, tests, LookML models, even entire SaaS architectures that ultimately do not add business value.
One of the main ways that engineering and product teams mitigate these risks are by building prototypes. But in order to do this, data teams need to let go of their natural affinity for highly accurate data and instead become more comfortable validating ideas using approximate or hypothetical data. The payoff of doing this is the ability to have strategically valuable conversations early and often before the bulk of major development and analysis is performed. The value generated comes from engaging stakeholders in specific questions around representative intermediate data products. This results in clearer objectives and requirements with a higher likelihood of success. Build it once, build it right.
The following are some examples of strategies we have successfully used at HealthJoy which will be covered in the talk.
Data “focus group”: Asking questions in the complete absence of looking at the data to get at your stakeholder’s implicit assumptions, actionability, and desired outcomes. Questions like what trends do you expect to see in the data? Why do you think those trends exist? What would you do if you saw those trends? What would you do if you did not see those trends?
Data “wireframes”: Generating entirely fabricated datasets and plots (Google Sheets is great for this) and talking through how your stakeholders would use this data. This is great for making sure that key technical requirements are not being glossed over. The idea is to be able to perform an end-to-end task against fake data. Core questions include, show me how you would use this data or this plot.
Data “POCs”: Working through proofs of concept with the absolute minimum data, usually a tricky corner case to generate accurate technical requirements, such as for dbt models. This usually follows this after the “wireframe stage” and prevents you from trying to discover our requirements as you're building your tool. Core questions include talking through how corner cases were handled.
Data “simulation”: Using parallel data sets, even if they are known to be approximately accurate, as a way to drive high-level considerations like potential impact, total addressable market, and potential revenue. This is especially valuable in the product ideation stage. The results will only be as accurate as the data is, but the discussions will accurately capture the business considerations and strategy. Core questions include, what would have to be true about the data for this project to be successful?