2#5 - Data Availability (Eng)
October 10, 2022
Winfried Etzel VP Activities DAMA Norway
«Think of data availability as online vs. offline.»
What much of the discussions around data products, data catalogs, self-service boil down to is data discoverability, observability and availability.
I talked to Ivan Karlovic, Director of Data Analytics and Master Data at Norwegian about these topics and gained some fantastic insights. Ivan always loved analytics and using data to improve the business and started his dat journey with a course in data mining and with «Pure curiosity on how we can use data!»
Here are my key takeaways:
The Airline sector
- Airline industry is reliant on partner «A Flight is just a subset of an end-2-end journey»
- Data privacy and ethics are important topics for Norwegian and are faced with a systematic approach with an aim for automation.
- Norwegian is building a cloud based analytical platform to ensure a greater visibility of the data analytics setup.
- The first improvement should be on data discoverability, closely connected to data observability.
- «A Data Catalog will raise awareness of what we have of data assets.»
- There is a clear goal to ensure an automated observation of all data assets in Real time.
- A central team needs to be able to deliver cross-domain use cases, also across domains with different data maturity.
- «With this crisis-domain approach we are putting away some of the legacy discussions.» We can engage with each domain.
- It is ok to have specific crawlers on local data, but you need to synchronize it into the central data catalog.
- The organization needs to have a way to stay aware of everything that is produced.
- Except for sensitive data, everyone in a domain should be able to see all domain data. Outside the domain, people should be aware of what kind of data each domain maintains.
- «If you work with analytics or machine learning you always what to talk to the domain people, because you can easily misinterpret if you don’t have that domain experience.»
- Domain data products that are domain spesific without a use case outside the domain, do not have to adhere to central strandards. But if they can have a use cases outside the domain, they need to be fed into the central data catalog.
- Communication and understanding intentions from data producers to data users is really important. You have to continuously work with understanding.
- There is lost out business potential in not having data discoverable, no matter the quality.
- Most effort is wasted in rework of data products that where just not discoverable.
- «When it comes to self-service we need to set up technology in a way that the end-user does not have to think about the data, only the problem to solve»
- Even if only 70% of use cases can be solved by self service, we need to strive for 100% to ensure that we offload the expert data analytics team as much as possible to work on the tough cases.
- «Data Catalog: You can buy a monster that gives you 95% of things you don’t need, or cutting edge super-niche start ups. But you have some interesting players somewhere in the midle.»
- «Can we do data engineering on a meta level, without seeing the underlying data? Eg. For PII?»
- How can you retain knowledge in a distributed architecture?
- Ensure domain knowledge is fostered in the domains. Build a documentation repository
- Infrastructure as code. Technical knowledge supplied with context
- Domain knowledge is the most tricky and most difficult to replace
- Great documentation can be both motivating and time saving. Motivating to reach a higher standard and time saving for problem finding, onboarding, etc.