3#13 - Olof Granberg - The Butterfly Effect in Data: Embracing the Data Value Chain (Eng) Artwork

MetaDAMA - Data Management in the Nordics

This is DAMA Norway's podcast to create an arena for sharing experiences within Data Management, showcase competence and level of knowledge in this field in the Nordics, get in touch with professionals, spread the word about Data Management and not least promote the profession Data Management.
-----------------------------------
Dette er DAMA Norge sin podcast for å skape en arena for deling av erfaringer med Data Management, vise frem kompetanse og kunnskapsnivå innen fagfeltet i Norden, komme i kontakt med fagpersoner, spre ordet om Data Management og ikke minst fremme profesjonen Data Management.

All Episodes

MetaDAMA - Data Management in the Nordics

3#13 - Olof Granberg - The Butterfly Effect in Data: Embracing the Data Value Chain (Eng)

March 25, 2024 • Olof Granberg - DataMentor Consulting • Season 3 • Episode 13

«If you want to run an efficient company by using data, you need to understand what your processes look like, you need to understand your data, you need to understand how this is all tied together.»

Join us as we unravel the complexities of data management with Olof Granberg, an expert in the realm of data with a rich experience spanning nearly two decades. Throughout our conversation, Olaf offers insights that shed light on the relationship between data and the business processes and customer behaviors it mirrors. We discussed how to foster efficient use of data within organizations, by looking at the balance between centralized and decentralized data management strategies.

We discuss the "butterfly effect" of data alterations and the necessity for a matrix perspective that fosters communication across departments. The key to mastering data handling lies in understanding its lifecycle and the impact of governance on data quality. Listeners will also gain insight into the importance of documentation, metadata, and the nuanced approach required to define data quality that aligns with business needs.

Wrapping up our session, we tackle the challenges and promising rewards of data automation, discussing the delicate interplay between data quality and process understanding.

Here are my key takeaways
Centralized vs. Decentralized

Decentralization alone might not be able to solve challenges in large organizations. Synergies with central departments can have a great effect in the horizontal.
You have to set certain standards centrally, especially while an organization is maturing.
Decentralization will almost certainly prioritize business problems over alignment problems, that can create greater value in the long run.
Without central coordination, short-term needs will take the stage.
Central units are there to enable the business.

The Data Value Chain

The butterfly effect in data - small changes can create huge impacts.
We need to look at value chains from different perspectives - transversal vs. vertical, as much as source systems - platform - executing systems.
Value chains can become very long.
We should rather focus on the data platform / analytics layer, and not on the data layer itself.
Manage what’s important! Find your most valuable data sources (the once that are used widely), and start there.
Gain an understanding of intention of sourcing data vs. use of data down stream
«It’s very important to paint the big picture.»
You have to keep two thoughts in mind: how to work a use-case while building up that reusable layer?
Don’t try to find tooling that can solve a problem, but rather loo for where tooling can help and support your processes.
Combine people that understand and know the data with the right tooling.
Data folks need to see the bigger picture to understand business needs better.
Don’t try to build communication streams through strict processes - that’s where we get too specialized.
Data is not a production line. We need to keep an understanding over the entire value chain.
The proof is in the pudding. The pudding being automation of processes.
«Worst case something looks right and won’t break. But in the end your customers are going to complain.»
«If you automate it, you don’t have anyone that raises their hand and says: «This looks a bit funny. Are we sure this is correct?»»
You have to combine good-enough data quality with understanding of the process that you’re building.
Build in ways to correct an automated process on the fly.
You need to know, when to sidetrack in an automated process.
Schema changes are inevitable, but detecting those can be challenging without a human in the loop.

Speaker 1: 0:00

This is Metadamum, a holistic view on data management in the Nordics. Welcome, my name is Winfried and thanks for joining me for this episode of Metadamum. Our vision is to promote data management as a profession in the Nordics, show the competencies that we have, and that is the reason I invite Nordic experts in data and information management, or TOE. Welcome to today's episode of Metadamum, and this is our first recording in the new year 2024. I know it will be published a bit later, but this is fantastic that we can continue to provide quality content with great people in the Nordics to you, and today I'm talking to Olaf Granberg. Hi, thanks for having me. Fantastic to have you on.

Speaker 1: 1:00

We're going to talk about something that is really close to my heart as well, and that is how can we think of data in an end-to-end process? How can we see the entire picture of the work we are doing from a data value chain perspective? Olaf has a long career in data and analytics I think one up to 20 years that he has been working with that topic, and he has worked in data warehousing. He has worked as an architect, he has worked as a leader, he has worked with operations. He has worked in the entire life cycle, and that's why I think it's really great to have Olaf with us today, so please introduce yourselves, thank you.

Speaker 2: 1:42

In that sense, I am a single-minded, even though I tried to take the whole picture. I have worked. I started analytics back in 2006 as a developer and I have never really let go of the area. I've really worked in that area since as a developer, as an architect, in different type of leadership positions, and I really tried not all of the roles, but a lot of the roles in the area, and the roles have really evolved.

Speaker 2: 2:15

As I started back then it was very traditional, very Kimball data warehousing, etl, development with the graphical tooling and so on and now it's a completely different world, but the data remains not the same, but it's still data and still needs to be treated as such. So, with that said, I mean I've worked with the whole data life cycle in one way or another for the last yeah, I don't care to count, but it's 18 years soon. So I think that that's it really where I get excited and what I like to work with, and for me, it's so very close to the business, and I think that that's the reason why I think data is so interesting, because it really describes the business, the customers, the behaviors, the whole life cycle of the processes in the business.

Speaker 1: 3:09

And you already started to talk about the topic that we have set for today.

Speaker 2: 3:13

Yeah, I quite easily get sidetracked into data discussions whether or not I'm actually on the job or not.

Speaker 1: 3:25

Well, let's call it main tracked in this podcast setting, which is fantastic. But before that, I would like to get to know you a bit better outside of work. So what are your hobbies? What are you doing when you're not working with data?

Speaker 2: 3:37

So I have three kids and a dog, which of course takes up quite a bit of time. I have kids from the youngest one is eight and the oldest one is 16. So they provide a lot of fun opportunities to learn and help give car rides to them from practices and so on. When I'm on my own I like to do a lot of sports and things like that, so I try to be fairly diverse doing running, swimming, going to gym, doing some bike riding and so on as well. That's about the full week of activities, I think. But having also worked in the grocery retail business, food and such is really close to my heart, so I really enjoy both eating and cooking food as well. I think that that is I haven't really gone down the whole nerdy side of the food cooking. I have a few friends that get very, very scientific. I think that that is one where I kind of rebel against myself. So I actually do a lot by feel rather than by data. I really enjoy that.

Speaker 1: 4:48

So the I mean 20 years of career working with data. What initially sparked your interest in data?

Speaker 2: 4:56

So I was. Actually when I really started out. It was a bit of an accent that I ended up with in the analytics area. It wasn't as well known when I started, but when I actually started getting into it, I think it was the contact with the business that really made it. It was the whole being a quite technical developer while at the same time really focusing on business questions, really focusing on solving that kind of business problem. I mean, that's where I really got stuck in the area and where what really drew me in and has completely just kept me there Because I think that and it has been that way for a very, very long time is that it is one of the areas where you can be deeply technical, because you need to handle such vast amounts of data in quite a swift manner, but you really also need to focus on the business, and I think that that kind of span of interests has really kept me there and I think it's really, really fun.

Speaker 1: 6:08

What a fantastic answer. I really enjoyed that and I think we can dive a bit deeper into that as well. But before that and I think this is kind of interesting because you have been working or you said groceries and retail, but on a group level, which I think is really interesting, because then you really see challenges from a different perspective. You have to look at your data sources in a different way. You have to work across sectors and bring things together in a way that is really interesting but also challenging. How do you see working at a group level for you? How do you feel about finding the right level of synergies and involvement?

Speaker 2: 6:48

Yeah, here. I forgot to mention a little bit about where we are right now because we are recording things in the new year and I've been working the last five years with data and analytics at IKA Group on the group level, supporting our, of course, our IKA Sweden, all of the grocery stores, but also supporting uptake, that which has, which is Sweden's largest pharmacy chain, supporting IKA Bank and IKA insurance and, of course, our real estate company, kastet. Since it is the new year, I've actually moved on from IKA, continued today's talk about a lot of the good experiences we've had, but just briefly word for mention is that I'm leaving IKA and starting up my own advisory and consultancy. Going back to the question how do you find the right level of synergies? I think that's very important is that you actively attack that question rather than just saying, okay, we need to either put everyone in the same mode and use the exact same process as models as all, or just abandon the concept completely and say, okay, we are running very different businesses with very different legislations and rules, so let's just not bother trying to find synergies. I think the really, really efficient answer with that is also very effective is somewhere in between. So you need to find what are the right levels of synergies.

Speaker 2: 8:23

A lot of the things. When we are talking about data doesn't actually matter what industry you're working with. The data such as completely different. The data models are completely different, but platforms, for instance, are usually very, very similar. You're still talking batch streaming and so on.

Speaker 2: 8:45

You're still talking about the need for information modeling, information architecture, data modeling. How does that really fit together with how you work in the business, with the business processes and so on? For sure, some of the industries where you have more regulations, there are a number of strict things you need to follow that you need to have set up, whereas in less regulated industries, you don't have to follow that as strictly. But, with that said, if you want to run an efficient company by using data, you need to understand what your processes look like, you need to understand your data. You need to understand how that is all tied together. So I think a lot of those things are really really similar. Then, of course, yes, there will be like 10-20% on the sides that are completely unique to that company, but on the whole, it is a very, very similar journey for most companies.

Speaker 1: 9:49

This is kind of interesting because we, at least for the last few years we've been talking about data mesh trying to be more decentralized, closer to the business, to the end user, to the domain that actually creates the data and uses the data. That also gives a different approach and different levels of involvement. When you look at it from a centralized perspective, so from a group level, in a period of time where decentralization is really in and everyone talks about it, how much should you get involved? How much pull and push should you have?

Speaker 2: 10:23

And I think here is the way to start off with and basically saying that all group companies are different, but I think I'm not against the decentralization part of it, but I think there is an opportunity for combining that with centralized synergies. So, for instance, do I think that all data management should be done centrally? No, I don't. However, I think that all data management should be supported centrally, especially for a quite a long period of time when the company is maturing. When you are setting, how do you work, how do you? If we take data mesh, for instance, what is the data product? How do you access the data products? What are the governance aspects of it, all of those things if you don't take care of them together. And normally you almost have to start centrally by setting the standards, because expecting everyone to set standards by good will is not really going to work, because people have different priorities and most of those priorities will be very, very business centered. We need to solve this problem right now. And if you then say, well, we also need to find out the product standardization, the old governance and so on, they're going to say, yes, that's fine, but we have to do this first. So you have to kind of facilitate and drive that centrally. But what I don't think should be centralized is the domain specific work, actually defining the data product. Not what is a data product that you set centrally in collaboration with everyone. But what is my data product? If I work in the marketing department, for instance, my data products are this and this and this. That should not be set centrally. But what is the data product? How do I access it? How do we govern it? Who is responsible? As all that we set centrally but then actually filling it with content and setting it in the business process that is done in the domain.

Speaker 2: 12:35

I think that's the way to run an effect group is to really find that level where you help the companies that are out there, or the different business units that are out there, to be faster and do things easier and be able to really gain value from what everyone else is doing. That, for me, is the role of the group, is to really try to lift everyone a little bit, not take away their jobs. I think that's quite a big difference. I think it's very, very easy to say that, oh, this would be good if everyone did the same, so we need to do it centrally.

Speaker 2: 13:17

I think that is a sure way of creating bottlenecks. It's a sure way of creating detachment from the business. It's great at creating synergies, but in the end, if you're not doing what the business needs, then the synergies don't really matter. So rather, I think you should focus on trying to lift everyone so everyone sits on the same platform. And that platform can be ITISIS. I mean, it can be a data platform, it can be a data catalog platform, but it can also be how do we work with governance? How do we work with metadata management? What are our assets or asset types in our data catalog, and so on. So really trying to just get everyone on the same page.

Speaker 1: 14:06

What a fantastic answer on synergies and involvement. I really enjoyed that and it really resonated with me. I've been talking earlier about in the data mesh context, about subsidiarity principle. So do whatever you can do individually on the lowest possible level and just escalate when there's a really neat to align across. And that's the one thing. And the other thing that I thought of here. Is that something that we talked about? What two years back, I talked about essential teams as enabling teams. So, rather than having a central policy team, you have an enabling team that is basically working as personal trainers or the business, so you help the business go through the exercises, learn how to do the exercises correctly, but it's not for your benefit, it's for their benefit.

Speaker 2: 14:55

Yeah, and I think that that makes perfect sense and I think that the I think the enabling is really the right focus on the group level, that you should really enable others to be successful.

Speaker 1: 15:06

So what is really interesting from on the group perspective is that you can see patterns emerging across different sectors in the group. You talked about banking, pharmacy, insurance, retail, groceries. There's a lot of different approaches, different needs that you have individually, that you need to adjust accordingly, but on a group level you can see what patterns are emerging and what has to be governed the same way across those different parts of the business. And one thing that I find particularly interesting here is that from that perspective you can see and I call it the butterfly effect and data. You can see small changes creating a huge impact. That could be positive impact, it could be opportunities, but there's also definitely a risk to it. And with that I think we are in the middle of our main topic holistic approach to the data value chain and end to end process perspective. So how can, how can we think of those challenges and opportunities with data from that holistic perspective?

Speaker 2: 16:11

And I think here and this is why I get so excited working in this area, because I think the data area is there I mean it's really complex and we need to look at that kind of a matrix, because right now we talked a lot about the groups in the news and so on then we're talking really on the x axis. So how do we spread that horizontally? How do we get everyone efficient horizontally? But what, when it becomes really interesting in the data, is that we also need to look at it work, because we need to look at it from a source system into whatever integration platforms, data platforms we use, but also then into our executing systems, where we are maybe sending trucks to a certain location or sending offers to our customers or we're doing something to create a credit risk score so that when they apply for a loan they get something, a certain response, back, and then it becomes quite a long value of the data. And I think that one of the risks and one of the patterns I think I've seen that I hope, where I mean the pendulum has swung a bit back and forth right in the industry and I think one of the patterns I've seen that I hope we're moving away from now is that we are focusing too much on just the data layer, or rather on the data platform slash analytics layer, and then we're basically saying that, oh, bad data quality, that comes from someone, but you're only really interfacing with the guy selling data. You're not looking at the whole value.

Speaker 2: 17:52

Where is the data actually created? Why do we have a certain data quality problem? Why do we have text in a numbers field? Why are the dates wrong? Why are we using whatever format or, but most likely, I mean formats are usually fixing in the integration and you know, somewhere else. Often you run into things like oh, why is the item ID on this item different than the one we have in this system? Or why are the our customer IDs different in the system that will send the offers rather than when done in the customer database? A lot of those things.

Speaker 2: 18:34

You have to see the whole value chain and I think that this is what circling back I think this is what makes the data areas very, very interesting is that you both have to look at the horizontal how do we scale things, how do we work together in the best man but also look at it from a business domain perspective, from where the data gets created to where the data gets used and to where, in the end, the data gets lifecycle deleted. So I think here it's the and here we can come up in a lot of butterfly effects, especially, I think that I mean, if you look at the governing things, though, saying that when we introduce a new way of working with metadata, a new way of working with things like that, it can create quite a large effect across the whole enterprise for better or worse. It depends on how we've designed it. In some cases, we can see the benefit on paper, but in reality, it actually only creates extra work. No one really uses it, or it can be that it is a lot of extra work, but it's really worth it in the end, because a lot of people using the data now actually learns how to understand it, how to interpret it, they know where it comes from, they know where it's used, they know what it means, and so on.

Speaker 2: 19:57

Seeing this can be good or they can be bad, but then we also have to identify the really, really important data sources, data sources that are used quite widely. So if you look at any retail company, they will have things like the item registry where they have all of their items. That will, for sure, be very, very important to have good quality on. You will also have, I mean, if the sales systems where you're actually doing your sales if they don't add up, you're gonna have issues everywhere. So, but at the same time, you may have things like the information about the stores that may not be as important. Is it important that all of the square meters in the store is perfectly correct or that the coordinates are perfectly correct? It depends, right. It may be important, but it may also be just never used for anything or, if used, not used for anything important. So you have to really be aware of what are the important parts.

Speaker 2: 21:01

And getting back to that butterfly effect, I think the way to draw value from it rather than just get impacted by it is communication, and I think that is so very, very important in our line of work when we work with data and the business is the communication.

Speaker 2: 21:22

We need to have communication through the whole data value chain. We need to talk to the business, we need to understand the business, but they also need to understand us and they need to understand the data. So, and how do we build that understanding? That's by communication. We're together with them, but we also need to talk to the guys and girls that are building the systems where we generate the data, so that we understand what their system looks like, what are their priority in the system and make, and actually have a discussion with them so they see the impact of what they are creating, and some of the things that may not at all be important for an operational system can be extremely important if you're building models based on it. So I think my big takeaway is in that whole butterfly effect is trying to see what is important and mainly talking to people. I think building relationships is one of the. I'm not going to say that it's undervalued, but I think that it's extremely important.

Speaker 1: 22:30

Well, yeah, I agree with you. Everyone is talking about communication, but still we can't. I mean, we talked about it on the podcast several times and tried to figure out what are the reasons why it is so hard to communicate, and I think what we ended up on is that we are basically speaking different languages, specialized lingo. It's different perspectives, different focus points, which makes it really hard to communicate in a straight line.

Speaker 2: 22:58

Yeah, and I think one of my and I agree, I think that it is hard I think that one of the parts within our domain is that we have our own lingo. We do use that too much. I think that you really need to think about when I talk about we're in just in daytime, we need a certain format and the person on the other side is maybe going a bit black and say, okay, investing data, what does that actually mean? So, and that's just one of the many, many different scenarios where we are using our own, our own legal. I think it's very important to kind of just draw up the big picture, for very often, since we are quite technical as well, we kind of go into a straight, straight into the the nuts and bolts, saying that when you send the data, I want it to look like this, and then they say okay, but they don't understand what it's actually meant to be used for, so they they're not. The risk you end up is that you are saying, oh, I want, I want the data to look exactly like this, but you're not describing what you want to do with it, meaning that if they actually send the system that understands how the data is generated and so on. They have no opportunity to respond and basically say, but it's not really that data that you should be asking for. You should be asking for this this date and this is how it actually works. But if you have a sit down with them and say, oh, we want to create a dashboard that shows sales by this type of category, and they can say, oh, but then you misread this because this isn't the category that you're looking for.

Speaker 2: 24:47

I think it's very important to paint the big picture in it and I think that that is what are? What's the business goal of it. And I really want to stress the importance of going back to what are. What is it that we want to achieve in the business? And not to really get too far away from that, because it is easy for us to go into a backlog of data to be ingested and basically just try to work on that, totally separate from what we want to achieve in the end. And I think that that's the taking the end to end view of things. I think I'm actually talking about the end to end view of things. I think that that is the one of the problems we have with collaboration and communication is that we don't talk enough about what is it that we want to achieve, and when?

Speaker 1: 25:37

I have a lot of follow up questions on that one, but just one, one quick thought, because I've read about it earlier today and there was an article about what is the big flaw with German engineering, and they said the big flaw with German engineering is that Germans think that everyone sticks to rules. So if you use anything car, pc or anything engineered by a German, it works perfectly and endlessly as long as you keep it in the rules that it's designed for. But at the same time, we talk about AI a lot, finding easiest way to solve a problem that we might not think about as the correct way solution is intended to. But the same goes for for end users, right? So you can't predict people sticking to the rules. You have to find ways of engineering a system in a way that it can be used in different ways really, and that is the hard part of communicating this. So I thought I was. There was a really really interesting, interesting article which fits quite well with what you just described.

Speaker 2: 26:43

And I think it is a bit of a challenge, right? I mean, you're never going to get away from the challenge that saying that we want to create a data layer in whatever or maybe it's virtualized or or not or if you put it into data platform where you have all of the relevant data and you can use all of that data to build a ton of use cases, some of which we have no idea about today. Even, and kind of balancing that with say, okay, this is the user and that is the use case that we want to build right now, and just focusing on that use case, a lot of the data domain for me goes into the whole. You have to think both all of the time. You have to think, okay, we both need to build the use case while thinking about building that whole reusable layer where we take some consideration into what can be done in the future.

Speaker 2: 27:40

And for me, a lot of the data world is about thinking about both. So we don't exclusively think about one thing. We don't exclusively think about think about optimizing for the single use case that I'm building right now, but I also need to think a bit long term. Okay, can this data be used for something else. In a lot of cases there is a yes, but, and then? And I think then the important part is not just abandoning and saying, okay, this is a very specific data, but rather thinking, okay, if someone wants to use it, what do they need to know about the data? And maybe working then with metadata, working with good data cataloging, with documentation, but also with simple things like naming conventions and so on, and I think this is why the whole area just gets so interesting for me is that you have to think the whole picture all of the time. It's not easy, but it becomes very effective if you do it.

Speaker 1: 28:39

I really, I really like that approach because it really goes away from that do lingual perspective of creating data on the one side or certain purpose but also using data that is created for for different purpose. And how do you combine that? So you really have to think the entire picture. But how do you, how do you see that whole picture, which is not an easy task at all when you are in an engineering position, especially so there are. There are certain tools for end to end visibility, but it's tooling the answer.

Speaker 2: 29:12

I think. I mean there are a lot of things that can help, and I think that that is just finding the right level of help that you can get. Yes, there are two links. There are very good data catalogs out there, there are linux tunings and so on, but in general, I mean the question going that you can go back to is does the tool really help? Or rather, does the tool solve the problem? Generally the tool does not solve the problem. It can help, though, so finding those kind of things that that can help, but in generally doesn't really solve the problem. The real solution is people normally.

Speaker 2: 29:55

So you have to have the right organization, you have to have the right responsibility, and I think that is really important setting who is responsible for this data and who is responsible for understanding this data and how it is connected to the business. But then you can for sure drive a lot of effectiveness and effectiveness by having a tool where we can have that metadata, where it's easily reusable and easily accessible for as many people as possible, so that you don't have to call this person each and every time you want to do something. The combination is where it becomes powerful, is when you can combine the people that really understands and knows the data with the tooling and really kind of set up, or rather spread their knowledge in a good manner. So if you choose to implement the tuning for this, I think the important part is to involve the right people. It's the same thing as building and the platform. I mean, if it's a data catalog or a data platform. Just implementing the platform as such is just it's fun for tech guys, but it may not be what actually drives value in the business. But if you combine that with doing the organizational and procedural change in the business, in the people that work with the data, then it can be super effective and really drive value.

Speaker 2: 31:31

So I think, does all of the whole tooling to get that visibility have a place in the landscape? We're sure definitely. Do you need to combine that with a lot of change management for everyone? Definitely as well. I think that you have to again think of both. Normally I have yet to see the scene of a bullet that solves all of our problems.

Speaker 1: 31:57

Yeah, looking for that for years, aren't we? But let me ask you straight out, because we talked about people and do you think we have a competency issue in data, that we are, over the years, getting too specialized and not realizing the impact of our work.

Speaker 2: 32:14

Yes, what I see is that in maturing companies where is that? You're going a bit back from that. Quite often. I see that when you really start the journey, it's easy to become super specialized in each role and then as you go along and as you really start to work together, then you not lose the specialization but you kind of broaden it so you can become very deep in some areas. If you're a data engineer, for instance, still very, very good and the main focus is the data nearing.

Speaker 2: 32:47

But the more you talk with business people, the more you talk with the analysts, with the scientists and so on, you understand more and more on the fringes of what you are doing. And the more time you spend with that, the more you understand the big picture, the more you understand how does your work affect everyone else's and so on and what do they actually need, so that in the end you can be a bit more proactive and saying, okay, I assume this is how you guys want the data, but I think it's dangerous. But it's quite common, especially when you start out, that you're either very focused on building a single use case a bit in the isolation, fortunately or you say, okay, now we need to build all of the capabilities out front, we need to fill it with data. Then it's very easy to become very technical. Or you look at the platforms oh, we need to put all of this data in, and then you're only focusing on getting data in. You're not actually focusing on how is that data to be used later on, and in periods I mean inevitable that you become a bit specialized, because setting a platform in place is not something you do with the left hand or like 20% of your time. So I think it's inevitable. Understanding that it will occur can help you really kind of broaden that.

Speaker 2: 34:11

And I think it's important to be aware of it and really push people out and basically putting the engineers together with business people, together with analysts and so on, and really try to create those cross-functional teams that we always talk about. And I think for me, it's really, really important to try to build that communication and collaboration in those teams and not build communication by having strict processes. This is where the business writes the requirements. This is when we do a data model. Then someone picks that up and builds it and then someone is testing it and so on. I think that's where we get too specialized and we lose the big picture, we lose the communication and so on.

Speaker 2: 34:57

It's very natural that you can't put the team in place with, like a business person, a analyst, an engineer, a scientist and expect everyone to work 100% all of the time there. Over time their effort will go up and down depending on where you are in the whole project, but I think they need to be there through the whole life cycle of the initiative that you're doing, because then they understand each other, they can start talking to each other, and I think that's where we really get the effect. When they talk to each other on what do they want to achieve? How does that translate into an analytical problem? How does that translate into the data that's needed? And when the data issues actually pop up because they will you can have that conversation everyone together rather than it becomes in separate pieces.

Speaker 1: 35:51

So now, we talked about technology, we talked about people. Let's talk a bit about process and something that has been declined there. The holy grail right now is automation. Right, everything should be automated, either partially automated or entire processes automated. But what are the challenges that you're facing when you try to automate parts of a process or an entire process?

Speaker 2: 36:14

And this is where you get to prove your metal and if you've done your whole work on the data, is when you try to automate something with it. If the automation is, or rather if the data is wrong, the automation will not be, will not give the right result in the end. So I think this is where it gets super interesting, and I think the challenges are, of course, data quality in all of the dimensions there. I mean, if it's incorrect data, if it's incomplete data, if the references are wrong and so on, yeah, then then you're probably gonna get the automation completely wrong. I mean, worst case is it may look right, it doesn't break, but in the end your customers are gonna complain because they are being sent a completely wrong offer or their package just gets them to the wrong country or or whatever. But I think I mean challenges in in. This is basically the same as if you have people all the way. The problem is that you also have to be realized that if you automated, you're not going to have anyone that raises their hand and say, well, this looks a bit funny, or are we sure that this is correct? You're going to have someone that just says, okay, here's the data I'm gonna. I'm gonna take the based on the data, I'm gonna take this decision and just run with it, regardless if it looks right or not.

Speaker 2: 37:37

So I think the challenge here is that you have to combine having a good enough data quality with good enough understanding of the process that you're building. So so you have to be also aware of the limitations of of the data and understand we may not get everything in the perfect manner or we may not get the complete data, and so on. What do we do then? So that you in the automation can manage different scenarios, and then that's for the fully automated process, but also knowing when to sidetrack items in the in the automated process and basically said well, for this, in this order, I need a, I need someone to look at it before it gets sent to when. I think I mean, the whole challenge is, of course, when you build it. It's just understanding the data, understanding process, understanding data quality. But then when you run it in operations later on, you're gonna have so much more fun because all of a sudden you get a huge spike. So so you may have things like varying loads over time of data that you may need to manage and that may affect both the IT infrastructure but also the physical infrastructure. Can you manage to send them all, all of those orders, with the people you have?

Speaker 2: 39:03

But then you have things like schema changes or changes within the schema unfortunately not too uncommon that the schema changes in some manner. So you need person in the loop. They will catch that at some point that some step will break. You have a person that, okay, okay, this is broken, then you need to fix it. And so if you completely automate, then you don't have someone that actually comes in in the morning and looks at it. Then you may get a longer period before it comes up. Order to find is when something is changed on the data without the schema changing, and then that that that can become really, really hard to find, especially when, when you don't have someone that's looking at the dashboard and say, well, this looks funny. I have this text here where I'm expecting a, an item category, but if you completely automated it, it may just take too long before you find it. So all of those things are interesting challenges to solve.

Speaker 1: 40:02

So for everyone who's wondering why we need a steward, that is why a human in the loop to look at these issues quite early on. And I have one comment on on the data quality part, because we talked a lot about this. I never really liked the terms good or bad quality, because then we get really philosophical about what is good or bad. We have certain dimensions and we have a certain purpose for use. I've started calling them rather unknown or unknown data quality. And even with known data quality you can have bad quality data but you know how bad it is and that gives you an advantage.

Speaker 2: 40:36

Yeah, yeah, I very, very, very much agree. It's very, very interesting where, when you have people say, oh, we have too bad data quality in this, well, what do you actually mean? Is it the? Are we missing data? Is the data incorrect, is it old, is it so on. It's so just so many dimensions where you need to go a bit deeper into it and say, okay, we don't have. Say that you're selling t-shirts, we don't have everywhere all of the t-shirts have the same size in this field, where they shouldn't, where it should be the size of the actual t-shirt, but every someone just enters an M because that's closest on the keyboard, and then then they will build their process a bit faster then. So, yes, I very much agree, you need to understand the data quality. You need to understand, yeah, what and what effect it actually has, because in some cases you need to have the 100 correctness in the data.

Speaker 1: 41:38

In some cases it's just not achievable or even needed exactly you need to understand the data, where you actually need to understand the data, and that's why it's okay to have unknown quality data, because maybe it's not important, maybe it's not relevant for your processes, and I think this this is something that really focuses you on someone that needs to have an idea of the need on the one side, but also how the data is created and follow the entire route. Now, we talked about right at the beginning, about centralized teams versus decentralized teams, and, when it comes to to automation, when it comes to seeing the, the whole picture, when it comes to end-to-end processes, do you think a centralized data management is a roadblock or an accelerator?

Speaker 2: 42:28

I think that it's in this a bit over time. It's easy to start it, rather, but like this it's very hard to start out fully decentralized it is. You're gonna have so many different priorities, you're gonna have a lot of people that need to work together. So I think that having a centralized team but ideally that centralized team is actually made up of people from the different domains, so basically forming a team, but they're formed from data, information owners, data stewards, whatever owns they have, but actually sit out in the business domain, but then they have that as they work together and in that sense they are centralized. But if you create an organizational centralization, it's really easy that that becomes its own ivory tower where they're they're saying this is how you should work and then know everyone looks at that power point and says yes, and then they just don't, they do something else, or that you create a bottleneck where everything has to go through that team. But I think that the centralization aspect of it that you're saying, okay, we take a number of people, named people, put them together as a you, you in this virtual organizer, you are responsible for driving this and making sure that they have enough time, and so on I think that is very powerful. So I think the whole centralized aspect of it is powerful in driving the journey, especially in the beginning.

Speaker 2: 44:05

I think it's extremely hard to start data management journey fully decentralized.

Speaker 2: 44:11

So I think those two together are done right. I think it's definitely an accelerator, but it's like everything right if you do it right it's an accelerator, if you do it wrong it's gonna be a boat dock. Inherently, I don't think there's any perfectly right the wrong answer, because I've seen fully centralized team be very, very affected by being out in the different domains and really going out and sitting with them, and I've seen them be very, very inefficient by just being. Everyone needs to come to talk to us before they do anything. And the same with decentralization. I've seen decentralization become very, very messy and you have like one or two business domain that is very, very effective and then you have a bunch that can't really be bothered because they have so much more pressing needs to work with. So I think it's you can go right with both, but you can also go wrong with both. I think in most companies the sweet spot is having that centralized virtual team but really organizationally sits out in the different domains. Couldn't agree more with you on that one.

Speaker 1: 45:17

We are already at the end of it. Thank you for a fantastic conversation. I really enjoyed that, and do you have any takeaways or call to action?

Speaker 2: 45:25

I think takeaway for me in all of this is basically looking at the big picture, both from a synod perspective and, more importantly, from a data value chain perspective. So, seeing the whole life cycle, that relationship building, think about as a team, as a team member, building relationship not just within your team, but with the people sending the data, the people generating the data, the people using the data, so that you can all easily talk to each other and not and try to not make that too formal. I mean in some sense, in some sense in order to drive it over time. Yes, you need to formalize something. It's like data contract, but the data contract without the relationship with the people around it is really just a contract. It's not something that will bring value. So I think relationship building, that and the whole picture, those are my two takeaways. Thank you so much, thank you.

People on this episode

Winfried Adalbert Etzel

Host