Table of Contents

From Diverse "Humans of Data" to Data Dream "Teams"

Prukalpa is the co-founder of Atlan, a modern data collaboration workspace. She previously founded SocialCops, world leading data for good company (New York Times Global Visionary, World Economic Forum Tech Pioneer). She was also awarded Economic Times’ Emerging Entrepreneur for the Year, Forbes 30u30, Fortune 40u40, Top 10 CNBC Young Business Women 2016.

Data-driven teams will be behind the most amazing human achievements in the next decade, from curing cancer to developing self-driving cars to putting people on Mars.

But there’s a challenge: this team is one of the most interdisciplinary teams ever created. Analysts, engineers, analytics engineers, scientists, business users: all with their own tooling preferences, skill sets and limitations and their own DNA creating tons of collaboration overhead and data chaos.

These teams will only be successful if these diverse individuals find a way to collaborate effectively — when the “humans of data” become a real team. We experimented for half a decade to figure out tools, rituals and cultural practices that made our data team 6X more agile: we went on to build India’s national data platform and work with the UN on the global SDG agenda, among other things.

By walking through our journey and story, this talk will focus on tactical takeaways that data leaders and practitioners can start implementing to create data dream teams.

Follow along in the slides here.

Browse this talk’s Slack archives #

The day-of-talk conversation is archived here in dbt Community Slack.

Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.

Full transcript #

Barr Yaron: [00:00:00] Hello and thank you for joining us at Coalesce. My name is Barr and I work on product here at dbt Labs. I’ll be the host of this session. The title of this session is: from diverse humans of data. Two data dream teams, led by Prukalpa Sankar, co-founder of Atlan a modern data collaboration workspace. First order of business, all chat conversation is taking place in the#coalesce-data-dream-teams ,channel of dbt Slack.

If you’re not part of the chat, you have time to join right now, visit our Slack community and search for #coalesce-data-dream-teams when you enter the space. So what is a data dream team? I love that name,. Many folks are working on data-driven dream teams or not dream teams, but myself included, I work on a data-driven team and data-driven teams will be behind many of the most amazing human achievements in the next decade.[00:01:00]

Folks on data teams can have different names, analysts, engineers, analytics, engineers, scientists, business users, and they all have their own preferences, DNA and tooling. I’ve experienced this myself as an ex data scientist, which I know is a, is now a controversial term. And now working in product, the tools that I need to do my job have some overlap, but they’re not.

I’m personally looking forward to hearing what Prukalpa has learned about forming these diverse individuals into a data dream team and finding a way for them to collaborate together effectively. Prukalpa has been forming data dream teams for several years. She was even part of the team that built India’s national data platform used by prime minister Modi and every member of parliament in India.

Prukalpa has been awarded with a slew of awards for her work, including economic times’ emerging entrepreneur. We’re so [00:02:00] privileged to learn from her journey today. After the session Prukalpa will be available in the Slack channel to answer your questions. However you’re encouraged to ask questions, make comments, or react at any point during the session, I’m really excited for this.

So I’m going to stop talking and let’s get it started. Over to you Prukalpa .

Prukalpa Sankar: Thanks for having me. Barr. And before I start to like huge shout out to the dbt Slack community, which is where this stock was conceptualized, Julian happened to read a blog post that I had written about the why of what we do at Atlan. And it was like a half a decade backstory that drives everything that we do and that sort of led to.

For me in the last couple of months, a walk down memory lane, going to a ton of Slack messages and emails and things like that to conceptualize and put together hopefully a decade long learnings and into a 45 minute doc. So before we start hi everyone. My name is Prukalpa . I like to introduce myself as he life-long data [00:03:00] practitioner.

I’m the founder of a company called Atlan, which was actually born on five learnings as a data team ourselves. And I think my claim to fame is that I just had a ton of successes and see here is in building data culture across the world 200 data projects. And hope to share that with you. Okay. So before I start write a little bit about my history.

So it actually, we actually started out as a data team ourselves using data science for social good doing work with organizations like the United nations, the word banks have a large governments trying to solve problems like national level healthcare, public Yachty nation, because of the kind of work we were doing.

We were dealing with a wide variety and scale. At one point who were processing data for 500 million Indian citizens, billions of pixels of all. That sounds very cool. But the reality for me everyday was a ton of chaos. So what you can see on screen, these are actually real life snap messages that, from our own team backup.

[00:03:59] Internally, everyday was chas #

Prukalpa Sankar: Everything, it [00:04:00] was a fire drill. I have had cabinet ministers calling me at eight in the morning and say Prukalpa the number on this dashboard is broken. And at that time I had gone through that shield, opening up my laptop, realizing a number is wrong. And then calling my project manager to pick it out and what went wrong? This one quarter we doubled like all around, listen our teams, so we doubled in size. We thought it would solve You could realize at the end of the quarter that our productivity had dipped significantly, I knew analysts were not productive and they were not adding value.

They were also stopping on the productivity of our existing team members. We were spending 50, 60% of our time and just eating with these kinds of issues. There was a time where we literally set up beds in office. What you can, this photo is like a photo of beds in office. I’m not getting a few of us moved into office.

And that was I think, two months, a year where I almost never left office. One day I woke up to this email. It was from my oldest analyst and it said that he [00:05:00] quit exactly a week before to be very major project was due. He was the only one who knew everything about our data. He was the only one who worked on that project without him. I had no idea how I was going to enable that project. And I had dropped a lot in that. I went up to a terrace because I didn’t want anyone to know. And I cried. I cried for three hours and the next morning I woke up and I swore that moment would never happen to me again. I would never be so dependent on a single voice in developer data projects.

And that’s how we started what we call the assembly line project that. I goal was to make our team more agile and use overhead, make people productive. And most importantly to build resilience in our team over the next four years actually transformed. We became six times more agile. We went on to do things like we built this national data platform, which the prime minister himself uses.[00:06:00]

We were a little bit blocked enough for the United nations on the SDG agenda and how to use data science in it. We were able to predict excellence down to a single learning level. Like what you’re seeing on the screen. There’s actually a screenshot of every single building in Bangalore where I’m currently based.

And we were actually able to predict what Ashland was like in poverty levels and things like that to the point that we were actually able to predict where diseases were going to pick out in the next 15 days, enabling the government to send out show active, alleged to over 60 million citizens enduring.

So what changed?

[00:06:35] The backstory #

Prukalpa Sankar: Today for the first time ever at Coalesce I’m taking a walk down memory lane and sharing the backstory. Our experiments, Childs failures, and some successes that finally got us there. So back to that moment, right? When I woke up and said that’s never going to happen to me again, that thing I call it our entire data team, to our biggest conference room in office.

[00:06:59] Our humans of data #

Prukalpa Sankar: I do the [00:07:00] team. I like to call them the humans of data, was very diverse and less scientists, virtualization, experts, data engineers. This was pre dbt. So I no analytics engineers .Back in the day they all had their own tooling preferences. They had their own skills. They had that on indications. And they had DNA in the way that we worked Richa, my data engineering lead, he never liked to go to meetings. Richa , our project manager was a very people person. We all worked in different tools. It was really hard to collaborate. So anyway, I got him into this room and we done this exercise adopted from the Google design. It’s called hell Mikey. And so basically the way it works, obviously we went in a shit ton of trouble.

And so fundamentally you’re supposed to take anything. That’s a problem in the organization. And instead of doing that, held them in feed females into an opportunity. And so with that, I diverse teams came up with our dream wishlist. Our, how might these [00:08:00] how might we create a high-performance.

How might be plan timelines, better, how might we reduce dependency on individuals that hit by the bus syndrome? How might be onboarding new analysts faster and better? This became a dream of what we wanted the team to look like. And we use these hallmark needs to solidify them and to manifesto a childcare that said, where do we want to do.

[00:08:26] Our team charter #

Prukalpa Sankar: In the next 24 months, and we came up with things like ecosystem of trust. Cause I did a team is always going to be smaller than and divorce. So how can we create a strong ecosystem of trust? Where do most people, the engineers and scientists have to trust each other in and trust that data?

We talked about quantity of omelets, like helping me ensure that irrespective of the people or the person in one of the project, the output would always be something that would be.

[00:08:55] Drive towards 6X more agile #

Prukalpa Sankar: And that’s when we started executing our way there, I believe there [00:09:00] would do drivers in our journey to becoming six times more agile. The first is what I call it, the data stack. We used to call it the tech stack about where like human stock at that time. And the second one that isn’t as high because the modern data stack today.

But as what I had to call the tactics that we applied that helped our team to work effectively together. And so today I’d like to unpack the tactical tools and the rituals that we experimented with . So first starting with the data. This was, the pre-modern data stack back there one data stack to stop these, it wasn’t a, they weren’t like a hundred tools that we could really use at that time.

Life was pretty much a nightmare. And so we started decoupling sort of different problems. I’d eat at work and I’m going to touch on a few of them. The key, that number doesn’t look like message. The nightmare message for every data, black fish. Now, a days like what you can see on the screen, this is like an email where, it’s nine, who was like, oh, there are 12 schools, but in your dropdown on your dashboard there, we can only [00:10:00] see 11 and you know what they are.

But we found out that it would actually do files and, the single source of truth them. And so our whole goal was how can we ensure that our data and insights are always accurate? How can we ensure that all our analysts for. Standard data cleaning processes. And what we ended up doing was we actually ended up creating a framework for data cleaning and checking what on the left.

This is actually the spring work with things like standardization checks duplication checks, missing manual checks that the game, our base of building trust in the final data that we were using. And alongside that we actually ended up creating like a smaller tool that we call it cleaning hacks which was basically libraries.

And what, by 10 an hour back in the day that basically allowed us to ensure that we went, like everyone’s thought analysts went through that entire process. And every time despite that you would find that there was another it’s dependency. For some reason, we would always make sure that it gets included as a check that goes into [00:11:00] our.

Did he I would recommend that you use tools like dbt or our paid expectations to be able to do this. But I think the process is still involved in the process to build consistency in your ecosystem is incredibly important. The next job made make ID deadline. Next problem. Like this was a nightmare.

So for example, like this is an email we send to. I was things like, is this data that you send us? Is it monthly? Or we actually put in to find the unique identifier code in this dataset. So how do we map it back to the master data set things like this sort of villages in this state that have 2Q and 2PM, is this something that’s expected or is there something wrong with this?

Okay. How am I be here? It was just, how do we ensure a hundred percent context before we start working on a dataset? Because this was eating up a ton of our time. The widening problem, everyone knows is version 1.1, 1.2, 1.13, 1.135. We know that by [00:12:00] using that ID to set the different versions.

And so that’s when we actually started this encounter project that we call it a catalog. And that’s actually the first in donor person documents that I have to say the catalog is to the data team what github is to engineers. We actually ended up feeling three times setting up the data catalog. What on on the left is actually one of the first in Donald’s screenshots of what became asking today.

The, basically there’s a bunch of enduring who’s right? So one who is a blue that basically generally automated data profiling the blood to answer questions about the data origin reading lineage across our ecosystem and things like that. We ended up burning it over two years, ourselves. Today.

[00:12:37] The broken dashboard #

Prukalpa Sankar: Obviously I’m a little biased you can use tools like Atlan to help solve for some of what the modern data catalogs and tell you if the broken dashboard problem. This was an email that I found particularly funny. This was an email that said dashboard might be down again.

And we anytime something went small and our clients noted about. [00:13:00] And we didn’t know that something went wrong that drove trust. And so I thought process here was how do we create a framework by which if a dashboard fix how do we know that something went wrong before the client does?

And so back in the day, we went out and actually created these unloading and , which was like a in and monitoring thing, work across entire staff of out data team. Cleaning works. We actually do some custom stuff on airflow does not sending us email alert, what, baking in a data set for data sets, not looking like it should, things like that.

This isn’t a problem that fully solved for today. I think data observability tools they’re doing a great job to get to bikeway there. So a bunch of tools in this space that have recently come out, that might be interesting sets. If you were starting this over. Now, I know what kind of thoughts have been about the data stack.

And clearly those three that I double down on are not that the be all and end all. What you’re seeing on the screen, this is actually I gave a team book [00:14:00] flow. We brought out a bookstore chart of every single step that I would go through when we would work with it. So there’s a ton of other, either tools that you might need in the staff to optimize every step.

But my key idea here is luckily the modern stack is pretty mature today. And it probably has already done it for you. And so go find those for you or for your ecosystem. What I’d like to talk more about is the culture stack. I don’t talk a lot about the modern view staff, but I actually think we started to need to start talking more, the more known up culture stuff.

And I normally we talk about culture. Many people think culture is just what it is, but I actually filmed it. You can actually craft your culture to be the kind of culture that you wanted to be like, you don’t cut it. Doesn’t just need to happen. And the way that I think about it you have values that you want to aspire to be.

And then there are rituals that you can ask me. Good. That can help you reinforce. Some [00:15:00] value is this is again, an exercise we did enjoy today. And I mind if look back chart that I showed you where you’re at with crystallized different things for key data team values. The first visibility, the second was trust third was was collaboration and the fourth was innovation.

And then against each of doors, we actually started creating rituals. We started experimented with a bunch of what you’re seeing on the screen are some of the things that worked and were fun that didn’t work as well. And we started experimenting with these vegetables and making them

[00:15:32] Agility #

Prukalpa Sankar: So like giving you an example, right? Let’s pick agility. Agility was huge for us. We were actually, we were in this really bad space as a team with doubled in size, but we were not more productive and we had been missing targets for quarters consistently, and we just needed to get. And so one major goal was how did we planned well and how we meet our goals effectively.

You were working on problems. If we’d never welded on to four satellite imagery classification. So we didn’t even know what we should be [00:16:00] planning for in some ways. So that’s when I actually ended up reading this book by scrum, the art of doing twice the work in half the time. I highly recommend it to everybody here.

And we actually adapted, so some of these engineering best case practices, the bigger team. And doing some of this actually improve their velocity by four times. They felt like we were able to go four times the amount of work in a week. And we did the week before. And just for anointings as we implemented this.

And I think sometimes this is helpful for all cultures, not just this one why this was able to happen. The first we started bottom up and not top down. We needed leaders. Didn’t go into a room and say, we want to stop. What we did was we gifted a copy of the scrum to everyone on the team. We done in going and learning sessions and clubs sessions and then before we introduce it, it’s where we actually experiment with it that quarter.

In those learning sessions, we understood and agreed on the principles. For example, the book talks about these concepts like stretching and how much time you’re [00:17:00] losing in context switching, and how could we estimate effort? So we did. And I think from there, once we the chosen that, and not though when you do the same thing again and again, every week, every month, no matter what.

And so in our case, we had Monday planning sessions. These were croissant daily stand ups every day and a retrospective within the week. We questioning once we establish our principles. We all agreed that we truly wanted to do the best that we could. And we want her to be as productive as we could be.

And so I deem acting the stand-ups would ask each other questions. Why didn’t you achieve what you planned yesterday? Are you are doing he estimating that’s right. I think you’re estimating that wrong. It might take. Things like that. And finally measure where are people going to be going off and measure our work as well as we can.

We had these weekly velocity measures. You’re seeing a Slack message that my colleague came on. She used to post every week. And it [00:18:00] would be like, this is only the last few. This was up the same day she could feature this week. And it really made a difference in driving us forward. At one point, we, just think that go up made a huge difference.

A few others then I’m gonna run through agility. How do we reduce context switching time I’ve seen. Yeah, unless we’re spending over three to four hours a day in context, switching time, you interruptions from other team members. And that’s when we, created these, the whoever collaboration hours, office hours, that was the only time that we were going to collaborate as a team.

You could book time with our senior analysts on the calendar. All the time is focused time. You talking to us, creating a shipping line. We had this factors that we call cited demos every Friday, we would go demo the products that we should that week. And I think that became an event that the team really looked forward to.

Like we were able to set the quality standard with a few of our, I first been gone in demos. And then, so the team just took it some there and that created this like process of shipping and sending did shipping in some ways inside the organization. [00:19:00] On that note. I think that one of our biggest, that doesn’t matter really came because of the beginning shift from a data services mindset.

I know there are a few people in the community, including talk a lot about this shift, but for us, when we changed the output of our came from being a successful implementation did we deliver on time to big video? And you saw the problems like . When did change your tongue, we’re building just for single use one student usability and reproducibility that changed everything for us.

Our starting point could be who is the user I starting point would be. So you can see on the screen, this is actually a presentation that one of our data PMs is back in the day and had done a, which is a leap day on understanding the stakeholders and what the position that they’re looking to me quit with what we were doing for them.

[00:19:45] Innovation #

Prukalpa Sankar: And that made a huge shift to working with. The one field that we had, the wits performance and driving high-performance and agility. What’s the not used innovation And so one of our homework we used was how we ensure that [00:20:00] we’re solving the problem in the best way. And we created this format that we stole it from Pixar, the topic, which has this concept quite of interest.

And we basically, modified it to the data center. So I, in private thinking before we would start a project, because one of the biggest challenges, the reality is one of them . But there’s collective knowledge of everybody else. And if you’re able to leverage that, the lack of knowledge is that would be the odds.

You’re probably going to be the most innovative thing you can build. And so we basically created this format that allowed us to bring this I want to go into details on this format. A lot of people posted on Slack because it is a little bit of leakage, but it was basic. He does create, we have the dead fifty-five minutes, ensuring that everybody in your team had all the context.

[00:20:42] Building trust and collaboration #

Prukalpa Sankar: I think he did the corner. And you have as a project leader, all the ideas that you need to drive as much innovation as you needed. The team building trust and collaboration was something that we worked on constantly. The one thing that helped us a [00:21:00] lot , was truly setting time aside for reflection and documentary. For example, we actually used to have a project page documentation on those a second one, our calendars, what you’re seeing here as a resource, it was a, as you saw us, which was learning from past data projects, it is it was basically a Quip document that you would keep adding to with every project that we will.

We’ve done these exercises that we go to quarterly, stop, start, stop, continue. It was actually what had figured out the new guy showed that she goes, that we wanted to start simplest, sticky notes, board action would come together and say, what do we want to start doing? What are you going to stop doing what we want to continue doing?

And that had all the facts work towards building a culture that was important to. And the final, this one is the most fun one. According to me, which has been collaboration with the data team of scales. Like the one thing I, as a data leader have joined to believe is that the reality would you guys, that it would be chaos, but that doesn’t mean that work needs to be.

And so one of our biggest, how might these was that knowing that [00:22:00] there’s gonna be. Knowing that the dashboard rates, the analyst is probably going to somewhere in their head question, the data engineer, as they did the data engineer, doing their job, the way that they should, how do we bring these minute frustrations out in the open and how do we truly improve collaboration between our team?

And so we actually started this practice that requires that. So the way it would work, it’s not a cycle, modeled on I’m listening anonymous or like sessions like that, where we would go into a room on a Friday night. And we would basically like, obviously that, we’ve got good food and, things like that.

And he would basically crib on our biggest frustrations of that week. And that was a very bowels. For the different people in the team to actually know what were the issues and the challenges that other people in the team would win, because the reality in the data and why it’s so unique compared to every other team, almost every other game [00:23:00] is actually pretty standard.

A head of sales probably used to be a sales rep or because I think at magazine partners wasn’t consulted at some point, but in a data team, what. a data , engineer’s never done this before. a data PM has never been on data engineer before we already try to be unique, and that makes it really hard for you to empathize and understand each other and these data trimming bodies.

So it made it a way for us to build empathy with each other and had been on listen, understand how the engineer was frustrated because something went wrong and I’ll biplane and why that happened. And in, in many ways,

So that’s mostly the backstory of how I team the game six times more agile in two years chasing our dream of building a better world through data. I want to take a moment to actually. Every human of data that was part of our journey at Atlan everything that we do is and we actually bought up the Nasdaq billboards for announcement and we we put up the photos of every [00:24:00] member of out data team that they truly are, what, innovated a sketch to where we got to as a team ourselves. And so finally some thoughts . That I’m going to leave you all with.

[00:24:12] Final thoughs for data leaders #

Prukalpa Sankar: The modern data stack has matured a lot since the day that we were a data team ourselves. And so something that I would highly recommend is, don’t build what you can buy, invest in tools that will make your team more agile and effective. The second invest invest in building your culture, stack, build the culture that you want to see.

Culture is probably the one thing that will make your data team truly successful andconvert your data team from just a data team dream data team,

the third. This is something I think very deeply about. I think. The first two, I, again, right? You can have data and gun. Everybody wants that data, I think could be better, but I think it’s also really important to build an organizational structure that [00:25:00] can help your team become as effective as possible.

So for example, if you want to drive, a better data stack and a better culture stack , think about building an org structure that allows you to do it. Some recommendations I’ve seen great success in teams that have set up a data platform team is responsible for tooling and templates. They don’t execute projects.

Their role is to improve the efficiency of the rest of the team. And some the roles there that are important in like a project, a product manager, a kind of persona, somebody who works with end users to understand and scope the problems and the challenges data platform engineers. And depending on, the scale off your team training and education if that’s something that’s before.

The second and I’m starting to see more of this is a data enablement team. I think sales teams have sales enablement team that is responsible for driving some of the culture, which builds on ensuring every sales rep has the right one name and things like that. Why don’t we have a data enablement team that is actually responsible for driving our cultural [00:26:00] rituals and enablement, improve our management across our data team. Some of the very interesting new roles I’ve seen are like almost chief of staff , personas for the someone who can be sponsored, who can be responsible for the culture of the data arc. Data enablement managers who own the culture rituals the seem two kinds of personas here work very well.

One is a very strong program management kind of persona, and the other is more of a community management. And this enablement arc between the data platform team and any blooming team like their role , it should beto make your team as agile as possible. And then finally , I know I’ve helped, so I’m not hiding.

The one thing that I wanted to say is don’t be afraid to think out of the box when hiring these are all new doors. They don’t exist in either teams. The best people for the jobs might actually go Jedi being your team and you go in and wait it might be an amazing. Who actually doesn’t create job with, building community or has strong program management skills.

The one thing that I’ve done in my life as a leader is that, don’t go [00:27:00] by JD go by people there’s always faulted and and fit them for the ox structure that you would like them to be badass. And so with that I’m actually going to stop and open the floor to.

Thank you everyone for having me here. You can reach me on @prukalpa or a p@atlan.com and excited to talk more about what it takes to build a dream data team.

Last modified on:

dbt Learn on-demand

A free intro course to transforming data with dbt