MetaDAMA - Data Management in the Nordics
This is DAMA Norway's podcast to create an arena for sharing experiences within Data Management, showcase competence and level of knowledge in this field in the Nordics, get in touch with professionals, spread the word about Data Management and not least promote the profession Data Management.
-----------------------------------
Dette er DAMA Norge sin podcast for å skape en arena for deling av erfaringer med Data Management, vise frem kompetanse og kunnskapsnivå innen fagfeltet i Norden, komme i kontakt med fagpersoner, spre ordet om Data Management og ikke minst fremme profesjonen Data Management.
MetaDAMA - Data Management in the Nordics
4#3 - Pedram Birounvand - A Paradigm Shift in Data through AI (Eng)
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
«The notion of having clean data models will be less and less important going forward.»
Unlock the secrets of the evolving data landscape with our special guest, Pedram Birounvand, a veteran in data who has worked with notable companies like Spotify and in private equity. Pedram is CEO and Founder at UnionAll.
Together, we dissect the impact of AI and GenAI on data structuring, governance, and architecture, shedding light on the importance of foundational data skills amidst these advancements.
Peek into the future of data management as we explore Large Language Models (LLMs), vector databases, and the revolutionary RAG architecture that is set to redefine how we interact with data. Pedram shares his vision for high-quality data management and the evolving role of data modeling in an AI-driven world. We also discuss the importance of consolidating company knowledge and integrating internal data with third-party datasets to foster growth and innovation, ultimately bringing data to life in unprecedented ways.
Here are my key takeaways:
- Always when a new technology arrives, you need to adopt and figure out how to apply the new technology - often by using the new tools for the wrong problem.
- There is substantial investment in AI, yet the use cases for applying AI are still not clear enough in many companies.
- There is a gap I how we understand problems between technical and business people. Part of this problem is how we present and visualizer the problem.
- You need to create space for innovation - if your team is bugged down with operational tasks, you are canibalizing on innovative potential.
- Incubators in organizations are valuable, if you can keep them close to the problem to solve without limiting their freedom to explore.
- The goal of incubators is not to live forever, but top become ingrained in the business.
- CEOs need a combination of internal and external council.
- Find someone in the operational setting to take ownership from the start.
- The more data you have to handle the better and clear should your Data Governance strategy be.
- Small companies have it easier to set clear standards for data handling, due to direct communication.
- You want to make sure that you solve one problem really well, before moving on.
- Before intending to change, find out what the culture and the string incentives in your organization are.
LLMs as the solution for Data Management?
- ChatGP already today very good at classifying information.
- It can create required documentation automatically, by feeding the right parameters.
- It can supersede key value search in finding information.
- This can help to scale Data Governance and Data Management work.
- Data Management will become more automated, but also much more important going forward.
- RAG architecture - first build up your own knowledge database, with the help of vectorizing the data into a Vector-database.
- The results from querying this database are used by the LLM for interpretation.
- Find a way to consolidate all your input information into a single pipeline to build your knowledge database.
- Building strong controls on naming conventions will be less important going forward.
- Vectorized semantic search will be much faster.
- Entity matching will become very important.
- Fact tables and dimensional tables become less important.
Data to value
- Be able to benchmark your internal performance to the market
- undertand trends and how they affect you.
- How to use and aggregate third party data is even harder than internal data.
- You need to find ways to combine internal and third party data to get better insights.
Data Management in the Nordics
Speaker 1This is MetaDemo, a holistic view on data management in the Nordics. Welcome, my name is Winfried and thanks for joining me for this episode of MetaDemo. Our vision is to promote data management as a profession in the Nordics, show the competencies that we have, and that is the reason I invite Nordic experts in data and information management for a talk. Welcome to Metadata. This is a bit of a special episode because for the first time since starting Metadata four years ago, we have someone returning to the podcast, and that's Pat Drum. Welcome.
Speaker 2Thank you so much. Thank you, Winfried. It's a pleasure being here.
Speaker 1It's fantastic to have you back. For the listeners that haven't listened to the previous episode we recorded together. That was an episode on skills we would need in data. That was an episode on skills we would need in data. It's interesting because since we did the recording about two years ago, there has been a lot of development in the data space, especially through Gen AI and the AI hype. Yet I still think that the episode we recorded at that point of time we did it is still as relevant, maybe even more relevant now, because the things we are talking about the skills that you need they kind of stay constant. I mean, there are certain flavors that you add on to it, but the basic and the foundation is constant, and that's why I think it's a really good episode. So if you haven't listened to it yet, I would recommend you to do that.
Speaker 1But today we're going to talk about those changes that we have seen the last two years and I think it's going towards a paradigm shift in data through the power of AI. There are several aspects we want to talk about today. We want to talk about what that means for technology investment and what that means for navigating those trends or companies to leverage the value of data or through data, but we also want to talk about what that means for data and data management in organization. So how does AI influence the way we do data? So I'm really looking forward to this, an episode packed with insights. Before we start, pedram, please give us a short recap of who you are and what you're doing now Absolutely.
Speaker 2Thank you so much, pedram Birnvand. I've had the privilege of working within the data domain for the past close to 20 years now, so I've seen a lot of changes throughout my career in trends within data. I started off my journey with building access databases so your younger audience will probably not even know what that is Going to building Microsoft SQL Server databases and Oracle databases, to then transitioning more and more. When I had the privilege of working at Spotify when we still were a startup so I was among the first 300 employees and manage our data assets there. We built MapReduce jobs and used Hadoop, and that was a tremendous leap when it came to big data. And then we also, during the last year or two years where I was working there, transitioned over to Cloud Data Warehouse, which was a Google Cloud platform and BigQuery. And after that, of course, we realized the amazing capabilities of using Cloud Data Warehouses.
Speaker 2And in my next role, where I was heading data at a private equity fund called EPT, I had the privilege of building things from scratch again. So one of the good things when you're starting from scratch, you can learn from all your mistakes and hopefully you're not going to make the same mistakes again. So we build everything on top of Snowflake there and use dbt. We were probably one of the first adopters of dbt in Europe, so the product was still called Fishtown back in the days. I think it was 2016. And we also were one of the first adopters of Snowflake, so the Snowflake office here in Sweden was basically one person. So it was an amazing journey to do together with DBT and Snowflake, and we managed to do amazing things when it came to analyzing our data, when it came to our portfolio companies, to assessing new companies and doing sourcing and also screening of potential investments. So and for the listeners that doesn't know what a private equity company does it's a company that invests in other companies. Basically, and within private equity, you have a majority stake, while in venture capital, you have a minority stake in the company. So it was very important for us to do value creation for the portfolio company. So I also had the privilege of being an advisor for a lot of our portfolio companies when it came to how to set up good organizational structures, what type of roles you needed when it came to data, what type of architecture you should try to focus on you needed when it came to data, what type of architecture you should try to focus on. And again, I think it's because I acknowledge so many mistakes that I've done in the past and hopefully I can share some of those with some peers so that they can avoid making the same mistakes as I have done.
Speaker 2And what has happened recently, like exactly as you just said, winfried, like it was such a pivotal moment when the hype with AI came. I think the first half an hour I thought it was probably just a hype before I actually started to use ChatGPT and realized that this is going to be such a game changer when it comes to working with data. When it comes to working with data, because we have so much within the data realm focused on making data structured in different ways, because that's the way that you can use database technologies so that you can filter, you can aggregate and you can explore and find information. But now, with the help of LLMs, it's a possibility to find a needle in a haystack by not having to structure information in the same way that, as you have done before. And that was such a huge transition to the way that I've been building up data warehouses in the past. And this I think it's a very big now moment in how would you design and design an architecture for the future data warehouse compared to how you did it in the past? So I started to questioning the roles of how will data governance look like going forward in the future when it comes to compliance, how will we work with data classification? How we will work with making information more user-friendly and adopting information to specific verticals within the organization.
Speaker 2So what happened was that I decided that I want to focus my full time in just working with the new architecture and AI. So I transitioned and built up a startup which is called UnionAlt AI. So I transitioned and built up a startup which is called UnionAlt. We're now supporting companies with leveraging AI, but mainly how to leverage AI when it comes to third-party data sets. So I'm a CEO of a small company. It's an amazing journey being an advisor for a lot of startups before now being in the grains myself and getting my hands dirty. It's such a amazing journey where you have to build basically everything yourself and there is no boundaries when it comes to innovation. When you're in a startup, you can try out a lot of different things. So I think it's an amazing experience for me and I'm really enjoying it.
Speaker 1Fantastic and you already talked a bit about what we're going to talk about today. But before that and I think that's interesting because you have that private equity background you have that startup background. At the time we are in now, where AI still is a hype, do you feel like that? And it's more about the focus on value creation that we have seen earlier that, well, the time to money should be shortened as much as possible, especially for startups. Do you feel like through AI, the vision is a bit clouded now, focusing more on the tech and more on possibilities rather than actual value output?
Navigating AI Hype and Implementation
Speaker 2So I think this is a natural transition. Always, when a new technology arrives, you always have people in a capitalistic approach, that people are trying to adapt to the new trends and, unfortunately, when that happens, you're trying to use the tools the wrong tools for the problem, basically. So we could see a very similar thing when data science became a hype, that very often you hire data scientists for roles which was very classical analyst type of job, where you put a data scientist to build a Power BI report and that was a very bad use of that skill set. And at the same time, because it became a hype, even data analysts start calling themselves data scientists, because, of course, it's a market economy and you want to optimize your position when it comes to services and career. And then you start, you know, cannibalizing a little bit on the different roles and names and then it becomes very hard for leaders that are on the top tier, like the C-suite, because they are not engineers. Very often they are not the people that actively knows how to interview people and what to hire for.
Speaker 2So when they see a trend such as data scientists, they go and try to hire a data scientist without really having the right skill set of figuring out. Is it the data scientist that I need? Secondly, is this person that I'm interviewing actually a data scientist or a data analyst or a data engineer? So I would say, yes, there is a problem always when it comes to hype. But if you let the hype a little bit fade out and you see the operation actually catching up with the hype itself, I think it will start consolidating to becoming a much more value proposition than what it is today, because I definitely see bad applications of AI today.
Speaker 1So we've seen some of the companies and startups that have been created through the AI hubs over the last years starting to struggle in a certain way, especially when it comes to AI-focused consultancy. I've seen this in Norway, we've seen this in Sweden, and this is about understanding how the hype actually materializes in projects, how the hype materializes in value creation and in larger companies. So if you are consulting a larger company where they talk about the hype but they're actually not doing any projects related to the hype, then you see a problem, and I think right now we are at the stage where really everyone is talking about AI, but from your perspective and from what you are seeing, how much is actually materializing into concrete projects?
Speaker 2so many failed projects when it comes to AI and I still have the fortune of being an advisor for a lot of C-suite companies and I see that they understand that this is a pivotal moment in adapting a new technology and if they don't, they will miss out, and probably not only missing out, they will probably have a very tough time competing with other companies that has adopted this. So it becomes like a very challenging situation for a lot of companies because they invest a lot right now, all the way from the board to the C-suite, but they are not sure exactly what use cases that they want to apply the AI for, so they become a little bit pushing for something and because all good employees, they want to deliver on what their managers want, so they start also trying to figure out different things. So the biggest success that I've seen in applying AI is probably when you have a very digital and embedded in the leadership of the organization. Very often when you have a chief digital officer or a chief data officer within the organization that actually has thought about these things before the hype of AI actually came across, because they have already like a very strong bias on what are the areas that will bring us very a lot of value. And what were the actual problems with data that we had? Because a lot of the business people they often don't really realize the problems, because they work with Excel and they very often do mashup of data inside Excel and very often they do a cleanup in Excel and the problem is sometimes a little bit hidden, because what they see is a PowerPoint which shows you very clear narrative and clear numbers, while the detail people and the data people they very often understand the problems itself and typically have an idea on how to go about solving them and then seeing is this tool the right tool to actually solve the problems that we do have?
Speaker 2So that would be my, and it's also important that this has high intention, because if your team is still very bogged down with trying to solve the day-to-day problem at the same time with trying to do innovation, it becomes extremely hard because then you're cannibalizing on your innovation capabilities with operations and fixing bugs and fixing month-end close and all of that stuff. You need to have a team that can actually probably be dedicated to try different things out. It's typically we call them incubators in the company, where you basically take a couple ofators in the company where you basically take a couple of people in the organization and you let them go like a startup and explore new things and then they have to have a very clear leader on the top that actually has a good notion of what problem they want to solve. So in those cases I've seen a really good application of AI. But in other scenarios where I see not the proper investment or not the good like a problem statement that they want to solve, then it becomes a little bit shallow.
Speaker 1Very good, yeah, and I have actually two points here that I think, from what you just said, that really resonated with me. And one thing is that incubating in organizations and how you can create a we'll call it a startup setting in an organization. How many years is it now since Eric Rice published his book Lean Startup Guide? What? 15 years, maybe, yeah, maybe more, and I've seen this work and I've seen examples where it didn't work. And what didn't work is you can easily lose the connection to operation. That means that you are incubating, you are really innovative on ideas in that setting, but it's really hard to operationalize them afterwards because the connection is missing.
Speaker 1That was the one thing, and the other thing that I wanted to emphasize on is you talked a bit about that understanding of AI and possibilities within it in a broad setting and in a C-suite setting, and we've been talking about data literacy for years. The term AI literacy is coming up. So is there a structured way of gaining that understanding of AI on that level, and how much should it be? How much do I need to know, as a CEO, about AI to actually make the right choices?
Data Literacy Challenges and AI Solutions
Speaker 2So I think that, as a CEO, you have the responsibility of knowing your limitations and knowing when you need help. Basically, and here is always the question, should I get external counsel from a services firm or should I have an internal counsel? I would say that you need a combination of both, but initially you need to have someone internally that you trust that can drive a certain questions for you, very similar to why you do have a head of HR and you haven't outsourced that to somebody else, or head of product, and so forth and so forth. So if you cannot expect the CEO to be, you know, super, duper, knowledgeable, but they need to understand the business proposition and the value of investing in AI and then also realizing their own ability of not being able to go down in details and into the grain, so they need to hire for that. And then the question comes when it comes to data literacy will that decay over with the help of AI or will that make life even more complicated? I would say that what I've seen throughout my 20 years in career when it comes to data capabilities, we have more and more fantastic tooling when it comes to building pipelines, building data transformations, and today, with the new cloud data warehouses and today, with the new cloud data warehouses, basically the simplicity of creating a table or of complicated transformation. The complexity has shrunk significantly and we are seeing a very clear trend when it comes to data mesh, where you are actually decentralizing the data capability across the entire organization as well, centralizing the data capability across the entire organization as well. So the problem with data literacy is not the capability of generating ETL or ELT or transformation. The problem here is very similar to App Store. When we started, there was only one weather app, but now you can find 200 weather apps. Which of those should you choose? And that is the thing that will happen in the future as well, and I know that personally myself.
Speaker 2One of the ways that I try to address and accommodate the challenges with having a lot of data being produced all the time was focusing a lot in data governance and having a very clear data governance strategy. But the big challenge that I faced there was that the biggest and most successful data governance projects. They come typically from big corporations that have very clear regulatory requirements on them to document their data sets If it's a medtech company, or if it's an insurance company, or if it's a bank, or if it's a medical company. These are all heavily regulated organizations, so they are bound to do good documentation, or else it's not only reputational risk, it's actually operational risk for them. But trying to do the same thing when it comes to an organization that don't really have these type of incentives becomes extremely hard, becomes extremely hard, and I've gone the wrong path myself on trying to push this, because everyone said that we have such a poor data quality. We need to fix the data quality problem. So I tried to push the concept of data cataloging and data governance to an organization with not the necessary incentives. And what happened? It was an amazing start of the project. Everything was well documented, you could find things, you can collect things, but took not more than six months. And okay, if we want to be optimistic, it took maybe one year before the documentation was not fully aligned with the pace and velocity that we were creating data transformation.
Speaker 2So, to go back, I think that data literacy today, one of the biggest challenges is that we have a little bit too much data and we don't really have the capability of being in par with knowing. What are we producing? What will happen with what I think is the future, with AI? Is that this part of documenting things, classifying things and finding things will be a much, much simpler world in the future of AI and LLMs, because you don't have to do manual documentation in the future and I'm not only talking about the technology as of today, but think about it more like a five-year horizon. We can today see that ChatGPT and similar tools are very good when it comes to legal documentation. Just imagine how good it will be in five years, and already today we can see it really well performing when it comes to classifying data.
Speaker 2So if you give it, you know these. This is the column. These are the different attributes of the column. Explain what this is. It generates quite good documentation. Secondly, if you want to find something, instead of doing it in a classical way of a key value search where you try to pinpoint a certain record, now you can use large language communication. So I think that the data literacy we have tried to resolve that historically with data cataloging and data governance what I see in the future will happen is that we're going to address the data literacy problem with the help of AI and LLMs and we're going to use vector databases. We're going to use natural language to query data, and then we're going to get much, much better and scalable approach we're working with data governance than what we do 10S is very much needed, especially for data governance, and you said it already.
Speaker 1There is definitely a difference in what sector you are working in and how regulated. You had an example where it's really part of the ingrained mindset or way of working that is hard to change due to the degree of regulation that you have, and this is kind of interesting. I'm working in oil and gas, which is highly regulated, where certain things are much easier to implement. You see a different challenge there that the focus is on compliance, not on value creation, and that switch is kind of hard to make sometimes. When you look at sizes of company rather than the sector, would you give different recommendations right now to a startup company versus a mid-cap, large-cap?
Speaker 2Absolutely 100%. So when it's a small company, one of the good things when it comes to data capability is that you have still quite limited amount of people that can work with data. It becomes quite simple to get hold of setting very clear standards and policies Just by having verbal communication with each other. It becomes quite simple to making sure that you can keep things in order. And then another very important aspect is that small companies has a much bigger ability to shifting because you don't have to migrate a lot of legacy platforms, because you don't have that many legacy platforms, so it becomes very easy to transforming and adapting to new technologies. When it comes to mid-cap companies, they are somewhere in between the small you know being one small team maybe from three, four people working within the data. Now there might be one or two teams of eight to 10 people working with data, sometimes even a little bit more. What I would recommend to them is that whenever you want to try a new solution on a new technology, instead of trying to embed that within the same existing team that is bogged down with building a lot of different things for operations, you need to make sure that they have breathing room and can focus on the innovation, but when I say innovation, because I really love what you just said before, winfried, you have seen a lot of incubators fail as well, and one thing that I see one of the main causes of incubators fail is that they have a little bit too broad of a problem scope. How do we typically in venture capital investing companies, we want to make sure, especially when they're early stage, that they solve one problem really, really, really well. Not that they solve 100 problems, mediocrity, but one problem super well. And some of the incubators that I've seen sometimes fail they get a little bit too big of a problem statement, that they were trying to solve too many things at once. So if you're going to build an incubator, make sure it's a narrow problem statement with a very finite amount of time of trying to resolving that. So, because you don't want to get to a position either that that becomes its own team and lives forever, because what you in essence want to do is a position either that that becomes its own team and lives forever, because what you in essence want to do is to start embedding that within your operations, and that is very important. So that would be my recommendation have a clear problem statement, narrow problem statement and they need to solve that best in the world and they need to also be incubated so they can actually get the necessary time to focus when it comes to large cap.
Speaker 2Here I think it is extremely important to understand the political game of the company a little bit. So I might not be super political correct myself right now, but if it is, say, an insurance company or a big bank, you have to figure out what creates the strongest incentive for that organization to move the entire organization towards a certain direction. Sometimes you can leverage regulations. So it's a technique that I sometimes use is that oh, we have a new regulation, so everyone needs to shift and regardless if they want it or not, banks are super good with mobilizing when it comes to regulations because they have to. So that could be. But if I would do the same thing when it came to a tech company such as Spotify, they would probably throw me out the door straight away.
Speaker 2So it is important for you to understand the culture you're in and what are the strong incentives and use that to your advantage. I would definitely say still very close to the mid cap, make sure that you create an incubator when it comes to innovation and don't really bog down, because in large companies very often you have slimmed organizations. Even if you have a large organization, they're extremely stressed and when it comes to a resource and availability, so you need to large organization. They're extremely stressed when it comes to a resource and availability, so you need to make sure that they have the necessary bandwidth of focusing on the innovation. So, taking the same approach as the mid-cap, but taking into the consideration of understanding the politic dynamics in the organization and what are the key drivers to make the very big company to actually shift towards a new technology, very interesting.
Speaker 1Thank you so much. And just on a bit of a side note, but since you talked about the incubating role in organizations, focusing down on one problem at a time, just to add on to this what I've seen from my experience is that you need not just to focus down on that one problem at a time. Just to add on to this. What I've seen from my experience is that you need not just to focus down on that one problem at a time, but you need someone in the operational setting to take an ownership of that problem. So if they see the problem, if they come up with, oh, if we fix this, this will happen. It's much easier for them to afterwards operationalize the new idea or new technology.
Speaker 2That's exactly it. And the more narrow you set the scope, the problem statement, the easier will it be also to operationalize. Because if you are, for example, if you take changing one process end to end and then incubating that and then changing that one particular process, it's much easier to actually then embed it into the operation rather than you trying to change 10 processes at once or 20 or 30. And especially if you go with the wrong approach of trying to solve maybe a portion of the process and not the entire value chain as well when it comes to incubation, because if you are still dependent on old system and old ways of working, but then you're just trying to put some makeup on a pig, then it becomes very challenging when you want to merge these two operations together.
Speaker 1Fantastic, very true. I'm eager to move a bit into the world of data management and how AI can change the classic way of data management, and you already talked about a couple of things when it comes to how do we do documentation, how do we do architecture, how do we think about data modeling and who are we modeling data for, which are, I think, really interesting topics to dive a bit into. Would you say and this is maybe from my experience throughout the last two years with LLMs I've seen a bit of a misunderstanding, maybe in the way you use chat GPT, for example. It's not used as well. People confuse language with knowledge and you use chativity as a knowledge base, like you would use Wikipedia, for example, which is really not the intent or purpose for it, and I think that can definitely lead to some misconceptions, confusions and mistakes. So the question is really how do you see LLMs to be applied in a proper way in an organization to deal with this architectural modeling and governance problems?
Future of Data Management With AI
Speaker 2I really love that you brought this topic up, because I can definitely see the same type of challenges that you just mentioned. So, first of all, exactly as you said, llms are language models and it's not more than that. And when you're actually setting up an architecture for leveraging large language models, you have to understand the technology, what it is and what it's used for. So I would say the first thing that you need to learn is what is a vector database and how does vectors work and how you can build matching between words. So and this has become a very popular architectural approach, which is called RAG architecture, where you actually first build up your own knowledge database with the help of vectorizing the data into a vector database, and then you start querying the database and then the results from that query is then interpreted by an LLM model that actually just takes the query result and generates a narrative around the query result. That is a much, much stronger approach of using LLMs than thinking that LLMs is an all-knowing knowledge database that you can just take, chat, gpt, apply to your organization and it can answer all of the questions that you have in your organization. But to be able to build this knowledge database, you have to have quite good data management operations. So you have to have stored this information in a structured way in the beginning. And when I say stored in a structured way, then I don't mean that you have to store all of the data in a table structure. It's just that if you do have PDF documents about your product descriptions, or if you've had customer support data, that you have all of those data sets in one or several systems where you can consolidate that information to a single pipeline and then you can build your knowledge database based on this information. So if you want to build a QA chat, for example, which is a very strong application on an LLM, you need to have a very good QA history where you've actually had humans that have replied to a bunch of questions. Or you need to have a very good documentation somewhere about your product that you can vectorize and use as your knowledge database. But if you have bad data quality, then the RAG architecture will also start answering really strange things back in your QA, having to work manually with all of these type of things, as we've done in the past.
Speaker 2But their role will be even more important because, if we're going to build automations on top of LLMs, the content itself needs to be extremely high quality, or else you're going to put in bad quality data and also get even catastrophe out from that. Meaning catastrophe out because if you're building automations and you're having direct customer interaction based on LLMs, it could have catastrophical impacts on your customer engagement, where you might break some regulatory requirements and so forth. So that would be my recommendation and what I'm seeing when it comes to technology, that I talk about vector databases, what I would foresee in not very long future, that vectorizing data will be a part of all common cloud data warehouses in the future. So that's just as different indexing than the indexes that we have today. So you will just be able to say that this column should be a vectorized index rather than a classical clustered or non-clustered index basically. So it will become a commodity quite soon.
Speaker 1Oh, I think I can agree with you on this one. We have seen this and you described it a bit already. We're going a bit away from that single-purpose way of doing Gen AI, but rather combining it with other AI applications, combining Gen AI with expert systems, combining Gen AI with computer vision to get results to utilize in a broader sense, which is really interesting that development. But it has an impact on, as you already said, on how we classically do data management and what we think is important to create a basis to work on. And one thing that in the prep talk we talked about it that I found quite interesting was your thoughts towards data modeling and data models. So how do you see data modeling develop?
Speaker 2Yeah, so in my view, we have especially data warehouse architecture and, in general, the application architecture when it comes to data modeling is all about trying to create human, readable data structures, either if they normalize or normalize. You create dimensions and fact tables where you try to describe every dimension very clearly. This is my customer dimension, this is my product dimension and so forth. What I believe that in the future and we can see it already today due to tools such as dbt and the simplicity of generating new models and tables, it becomes more and more complex of keeping up and understanding how data is being structured, because we're also applying data mesh architecture, where you're saying that not everything will be stored in a centralized place and we're not going to follow the same naming conventions and things like that. And this became like a big risk to me before, like how will we be able to manage all of these new models and tables being generated with such a high velocity? This will probably, with the help of Gen AI, entirely change, because I think that the concept of having to build strong controls on how to name things, on how to describe things, will not be as important, because you do have LLMs that can actually do a lot of these things for you. You will have semantic search, which is vectorized search, where you can find information much, much faster, where you don't need to have a human actually reading a data model, but having a LLM or a vector database giving you results. That sets back so the notion of having clean data models. I think that that will be less and less important going forward in the future, because we do have the capability of using LLMs for automatic documentation. We have also vector databases, which can actually give you clear results set even if you have thousands and thousands of copies of data. But one area that I think that will be extremely important I know that there are a lot of interesting startups working on this already today. It's entity matching, or other words, master data management.
Speaker 2In some other context is how can I combine data that comes that is basically a list of products. How can I merge these two together? How can I generate a golden key? And there are a lot of really interesting innovations happening there as well of using LLMs for matching data and making sure that these things happen. And I'm quite sure not far in the future, a lot of these things will also become automated.
Speaker 2So imagine you can automate the search you can automate. So imagine you can automate the search, you can automate the find, you can automate the documentation and you can automate the joins. And then, all of a sudden, if you can do all of these things, will the data model actually be as important? I don't think so. So I think that we're going to, and we have seen that already. To be entirely honest, doing a classical Kimball data model is not something that everybody does. People build very flat tables with a lot of columns instead, and we have already seen that the concept of building fact tables and dimension tables are actually becoming less and less important. But I think that this will become even more so in the five years.
Speaker 1In the future, we will see that becoming even less important very interesting and I think there's going to be some reaction on this, but so so I will save my reaction, or later I'm gonna have a lot of trolls, I think.
Speaker 1Now I'm thinking well, I think it's. It's really interesting and, um, you're definitely right that there is a development happening, um, away from from the classic way we have been doing data modeling. The possibilities we have now through technology will accelerate that. I can agree on that. As a final topic for our talk at least, I wanted to talk a bit more about that value proposition and the value of data and data management in in terms of ai. And well, throughout the, the time we have come through now with the, with the gen I hype, um, the focus has been, as we already talked about, a bit maybe clouded, but more, more focused on what possibilities do we have, and not really on the on the value creation itself. I think that we will come back to that value creation quite quickly in a lot of companies, and then it's interesting and important to understand what is actually the value we want to create through data. How can this be accelerated through AI? How can we turn finally turn data teams from the cost center to the?
Speaker 2profit. This is also a topic very close to my heart. I would say that one of the biggest challenges and I think that I've said it in different ways throughout this talk that one of the challenges that I see that a lot of companies have a vast amount of data but they can't redefine the information and very often they solve the problem by creating a lot of, you know, matching and transformations and also cleaning in their Power BI reports or Tableau reports or whatever type of front end tool that they're using to query data. You can have natural language communication with the data and find the necessary data sets by just asking about what is my gross margin, what is my oh? Why am I selling more but my profits are going down, and having this type of conversation. What that will do is it will create a closeness between the actual business user and the data itself, and that truly generates data literacy.
Speaker 2Another very important aspect is that if we get a very well-structured data set internally, what are the next step in evolution? I would say being able to benchmark our internal performance with the market. We're being able to extract third-party data, to understanding what new customer verticals should we try to attract. But these things are quite hard because even when it comes to internal data sets today, it's quite hard to navigating what data should I use? How should I aggregate it? When it comes to third-party, that becomes even harder because this is data that you haven't generated yourself, but need to be able to leverage and merge together with your own data sets. Again, here is what we at Unionol have dedicated our you know, our time to to try to make that process more simplistic and consolidating all of the things that I've just said. Imagine if we can automatically document all of the data sets and you can start asking questions to the data and say what are my main competitors? I want to find companies to invest in which is within this region and have a revenue within this boundaries, and then being then lastly able to merge this information with your own internal data sets. I think that when you can start doing that, it's then when you're going to get the competitive edge when you're using data, and I could see that when I was working at EQT that with the big project that we call Motherbrain, that was our main competitive edge, and this is I want to make this functionality accessible to everyone, but we're still seeing very limited data out there in the market.
Speaker 2There are so many companies producing data, but they're not actually selling it, and the reason why they're not selling it is because the go-to-market is too hard. The regulation is too hard. The data transformation that you need to do is too hard. The regulation is too hard. The data transformation that you need to do is too hard. But imagine if you could start selling some of your data that are not actually your company's competitive advantage but might be just valuable data assets. Then you can shift your internal data team to be a cost center to becoming a profit center, because they can actually now start earning money by selling their data assets. And then what I think that the future architecture will be all about is that you will not only query your internal data. You will query data that is both a combination of internal and external, and that will make your data and analytics a much, much more competitive advantage compared to what it is today.
Speaker 1Wow fantastic and this is also a really good time to close up the podcast and I've learned so much. It's been a bit like looking into the future and I really enjoyed that. Before we finish, do you have any key takeaways or a call to action to the listener?
Speaker 2Yes. So I would say, going back to the conversation, make sure, if you're going to invest in innovation, that you do that 100% and that you have a very clear problem statement that is not too broad, that is clear to the point and that is an end to end process within your organization, so it becomes much easier to embed that into your organization. Make sure that if you're going to incubate something, you want to have a very, very narrow problem statement, as I just mentioned, and you want to avoid cannibalizing on your time by having the same team innovating as they're solving operational challenges. I would say also embrace the new AI era, and what that means is it's not that a lot of the roles will 100% change, but you need to have a much better understanding what the AI can give you and what it cannot give you. So you need to have a good, better understanding what the AI can give you and what it cannot give you. So you need to have a good knowledge bank of your information, both the structured one and unstructured.
Optimizing Data Integration for Growth
Speaker 2Relying and I haven't really said this before in the conversation, but relying that SharePoint will solve all your problems. I think that's a little bit too naive, because you will never have all the information. In a big corporation, even in the mid-market size, very often you have important information that are a little bit outside of your existing Microsoft stack or your existing GCP stack. So you need to have a very clear architecture and strategy for how you want to consolidate all the company's knowledge into a centralized place and leverage your LLMs on top of that. And then, lastly, I truly believe that the new architecture in the future will be all about combining your data internally with third-party data sets, because that's how you will be able to build a data solution where you can see how you can perform better, how you can reach new customers, how you can develop your product, because that's when you actually get an outside-in perspective when it comes to leveraging data.
Speaker 2That's when I really think that the data will come to life. What a fantastic call to action. Thank you so much. Thank you so much. It was such a pleasure being here, winfried. Thank you.