Wednesday, February 8, 2017

Do you have Data on your Data?

So you want to be a data driven organization?

Do you have Data on your Data?

Many organization take pride in what their teams do with there data and in this day in age more and more people either want to have data to make decisions or build more data to make decisions.  This puts an enormous stress on your data producers and analytics teams to go faster and create more.

Since most organizations do not have unlimited budgets for data and analytics, prioritization often comes up.  In many cases work gets focused to the "squeaky wheel" and from each business line there are always a series of priorities that are the "most important" for the company. 

How is your analytics team handling these competing requests?  Are they using data on the data that is produced?  Using data to measure the effectiveness of the data assets that are created often gets lost as teams need to move on to the next high priority request.  I classify data assets as tables, reports, data models, etc.  Here are some suggestions to track and measure your data assets and the time spent to build those data assets.

  • How many data assets does each business line have?
    • What is the coverage of data assets for each business line?
    • Does each business line have enough data assets aligned with their business priorities?
  • Which data assets are used over time?
    • How many data assets are not used?
    • Are data assets used immediately when they are rolled out and then die off?
      • What caused this drop off of usage?
  • What data assets are used & re-used?
    • What core data assets can solve multiple questions?
    • Are these critical to the business line?
    • What data assets that are special or small use cases? 
  • Who is using the data assets?
    • Is the data asset used by a broad audience or focused group?
    • How many users are Daily Active Users (DAU), Weekly Active Users (WAU), Monthly Active Users (MAU)?

Utilizing data on your data helps teams to make smart decisions around planning and prioritization.  This builds awareness of not only the data assets themselves but also the teams the request work as well as the adoption rate of them.  Having data & metrics for the questions above will help you and your teams to drive conversations with data and truly become a data driven organization.

Monday, March 21, 2016

Building & Utilizing Data for the Future

As part of the fourth and final post in the series "Enabling Data Driven Organizations", the focus of this post is about how to make an "engine" of both data and people to make your organization data driven.

Now that you have the inventory of your data environment, you understand the value, and have influenced changed in how people work, its now time to have an engine that can run and support future needs of the organization.  What I mean by an engine, is by collecting the inventory and building a product around it, people throughout the organization now will have innovative ideas.  While making enhancements to your product will help the organization, part of being a good product is enabling others to build upon what you already have.  This changes the product to a platform.  

  • A Product is a unique tool with a specific design/purpose.  There are use cases and scope, users, and ultimately an end to its development lifecycle.
  • A Platform is a series of products and API's that not only enable the use cases and scope, but also allow others to utilize that information/content to expand into other use cases and purposes, far beyond the scope what the original product team may come up with.
To enable to this, you need to enable systematic communication both for receiving content and sending out content.  You may ask, why this is needed for both incoming and outgoing content.
  • With incoming content, others within the organization may want to provide additional information to the platform like SLA, data quality, comments, and more.  Allowing this to be systematic enables other products and other teams to ultimately work off of a single platform.
  • With outgoing content, others within the organization may want to consume inventory counts, relationships, and more to provide a custom / unique experience to their user base.

When speaking to the context of the data inventory, building that engine (or platform) which allows others within the organization to expand on data driven tools/products.  The analysts may build notifications on new data sets, the data engineers may build SLA reports on their data pipelines, the data scientists may build dynamic models/relationships, the system owners may build dynamic notifications on relationships between systems, and the data consumers may build metadata management automation

This is where products become platforms (the boys to the men) and what a single product team started morphs into what an entire organization utilizes and drives their day to day work on going forward.  This is where you not only influence but also enable the rest of the organization to be "data driven"

Monday, March 14, 2016

Influencing Change Using Data

As part of the third post in the series "Enabling Data Driven Organizations", the focus of this post is about how to influence change within your "data driven" organization.  In the previous posts in this series we focused on exposing the current inventory of data objects and then understanding the value that provides.  Now that we know the value of the inventory, we need to influence change in how the rest of the organization utilizes this information.

First, I want to focus on who within the organization can utilize the value of knowing the data inventory:

Data Engineer

On a daily basis, Data Engineers are responsible for the flow of data and "building data".  As part of this responsibility, one of the challenges they encounter is understanding where data is sourced from.  A Data Discovery Search and Data Lineage are 2 values that will change how they work.

Data Analyst

On a daily basis, Data Analysts are responsible for utilizing data to present via reporting or standard analysis.  They need to quickly identify if this analysis or report already exists.  A Data Discovery Search and simple inventory reports will change how they work.

Data Scientist

On a daily basis, Data Scientists are responsible for finding new ways to use data and provide complex analysis.  Like Data Engineers, a Data Discovery Search and Data Lineage are 2 values that will change how they work.

System Owners

On a daily basis, System Owners are responsible for maintaining a reliable system like a database or reporting system.  With that responsibility, they need to understand what is used (and not used) as well as impact analysis (downstream) to other systems.  Data Linage (downstream) and Inventory Reporting are 2 values that will change how they work.
Data Consumer

On a daily basis, many individuals in your company are data consumers.  They can be executives which review dashboards or your business analyst tracking their progress.  With companies having multiple systems, a Data Discovery Search across platforms will change how they work.

Now that we have identified who can benefit from exposed data inventory, the next question is how to influence changing how they work.  This can be approached in multiple ways and can be a touchy subject.  How I would suggest to approach this is:

  • Advertise Capabilities - One of the biggest barriers to change is being unaware.  Informed people within the organization often will utilize tools that make them more productive.  If they are more productive than their peer, this rivalry will drive efficiencies.  This can be done through written communication or presentations.
  • Celebrate Success Stories - When a product is built, it is often built with use cases in mind.  Whether these are planned use cases or not, the early adopters that find success should be celebrated and communicated.  Similar to advertising capabilities, this showcases to the rest of the organization the value that people are already utilizing the product.
  • Encourage Change - As you see opportunities of how work has "always been done", offer your wisdom, information, and data to help solve these problems.  This could be decommissioning a table and understanding the impact.  This could be cleaning up a system by understanding what is used vs. not.  Offering the information to help solve problems faster, word will spread and people will follow in the future. 
  • Other Communication Channels - Find the other forms of communication channels that catch attention of the people within the organization.  This could be leadership communication, mass email blasting, flyers, or whatever else that may effective.

This step in making a "data driven organization", will be a challenging one but one that can gain traction quickly if done right.  Identifying who can gain value and educating them through the process will influence the desired change.  

Sunday, March 6, 2016

Understanding the Value of Data Inventory

As part of the second post in the series "Enabling Data Driven Organizations", the focus of this post is about understanding the value of the data inventory.  While hinted in the previous post, the exposure of the data inventory does drive motivation to complete metadata, the value of this information has is so much more.

Once you have the four major object types (reports, data sources, data movement jobs, and business terms), there is so much value that this unlocks.  

  • Clean up Environment - being able to understand the entire inventory of reports, tables and more across multiple systems, this allows you to get a global picture and clean up unused, duplicated, or un-needed content.  This helps ease strain on systems as well as just good overall clean up.
  • Decrease Discovery Process - being able to search and discover tables for content that is new to you, will decrease the time to understand of where data is and increase speed to deliver.
  • Reduce Duplicate Work - being able to understand what exists already, will reduce re-creating an existing (or similar) reports, tables, and more.
  • Mosts Popular & Unused - being able to understand the most used (and least used) reports and tables and help direct new hires as well as drive consistency across teams.
  • Audits - being able to reduce time on audits / reports on the inventory, who has access to the content, who has ran the report / table, etc.
  • Lineage - being able to increase knowledge / trust on where the data came from upstream on a particular reports or table. 
  • Impact Analysis - just like lineage, being able to understand downstream impact from a particular table will help keep focus when issues arise with the table or when you want to decommission that table.
  • Team Activity Tracking - being able to track how much content is being created, modified, and deleted by teams is a create way to track the impact of the team.
All of these are great examples of the value of exposing the data inventory (and usage).  Having this information readily available to make decisions, discover content, increase knowledge, or audits, provides an amazing amount of value.  Whether it decreases time to find information or allows you to focus on the impact a particular table has, this allows you (and your teams) to be much more efficient and be truly "data driven".

Sunday, February 28, 2016

Data Exposure Drives Motivation

As part of the first post in the series "Enabling Data Driven Organizations", the focus of this post is about exposure of the data inventory.  To be a data driven organization, you not only need to know what data you need to run your business, but you also need to know what data is currently available, how to get it, and where it is being presented.  This is where the "data inventory" comes into play.

When trying to understand your data environment, you should focus on the following objects:

  • Reports - anything that presents data for consumption.  This can be reports, dashboards, data visualizations, excel files, and more.
  • Data Sources - anything that stores or creates data.  This can be tables, views, cubes, extracts, and more.
  • Data Movement (Jobs) - anything that moves, transforms, manipulates data.  This can be ETL jobs, Kafka producers, stored procedures, and more.
  • Business Terms - all definitions of key metrics or attributes to drive consistent understanding.

ReportData SourceData Movement

As mentioned in the series "Enabling Data Driven Organizations", having an inventory of all of these object types across all platforms within the organization enables a lot decisions. Before we can get to those decisions, this exposure is a critical step.  A public exposure of this content can be seen as airing the "dirty laundry" of reports, data sources, data movement, and business terms.  This in itself is very powerful as there are likely way too many reports, way too many tables, and more.  The mere public exposure of this inventory will motivate your company to manage its inventory.  This is an effective way to motivate teams to clean up and self manage.  This I find is much more effective than a top down governance mandate.

The exposure of this inventory can help drive the following:
  • Counts of Inventory - this allows you to know how many reports, data sources, and more are within the company.
  • Metadata Coverage - this allows you to know how much metadata (descriptions, owners, etc.) is available within your inventory.
  • Activity Progress - this allows you to show how much content is getting created (or deleted) over time which is a measure of your data team's activity.

Bottom line, like the levels of data needs, exposing the inventory can motivate teams, build a base understanding of content, and measure activity/impact of teams.  This is the first step in  "Enabling Data Driven Organizations".

Sunday, February 21, 2016

Enabling Data Driven Organizations

With the previous series that was focused specifically Data Needs and Making it Useful, this new series is taking a bigger look at "Enabling Data Driven Organizations."  With organizations often having several teams and technologies that deliver on data needs, as time goes on this builds a massive inventory of data deliverables.

With this massive inventory of deliverables, organizations have a treasure trove of information that is often right under their noses.  When focusing on data, organizations often focus on Sales, Traffic, Daily Visits, and other key performance metrics, the treasure trove of data that is often just as valuable is what is available within the organization that has already been built.  What reports, tables, data flows, and terms exists within the organization.

Why is this information so difficult to find?  

Like many organizations, this inventory is often separated across the different silos of teams as well as different technologies.  How many companies have a single database or a single reporting tool?  While many technologies claim to be "Enterprise BI" tools, the truth of the matter is that there is often not a single technology across the entire organization.  Understanding that this is impacted by both process and technology, this can be a tricky problem to solve.

In the following posts, I plan to focus on:
  • Exposure of Inventory - as we all know, getting access to the data is the first step of data needs. 
  • Understanding Value - after getting access to the inventory, understanding the value of this data is critical.
  • Influencing Change - after understanding the value of this data, focusing on changing how the organization approaches problems and utilizes data "on data" to make decisions.
  • Building for the Future - after the organization understands and utilizes this information, how can you make this engine solid for the future?

I am excited to share my experiences and where I feel this can "Enable Data Driven Organizations."

Sunday, January 31, 2016

Making it to Predictive Data Needs!!!

This is the final post in the series Data Needs & Making it Useful.  As mentioned in the previous posts, this series was all about how to understand the different levels of data needs and my thoughts about how each level relates and is dependent on the previous level.  These levels were:
  1. Data Pulls 
  2. Products
  3. Alerts
  4. Predictive

To focus more on Predictive, I wanted to first call out why I feel this is the top most level out of the data needs levels.  As mentioned in earlier posts, Data Pulls are the base layer because literally getting to the data should be a base need, it is like drinking water.  Products are the next level, you monitor usage and build products to make life easier, so instead of drinking water from the river 5 miles down, you build a well to have it available in town.   Alerts are the next level, you want to be notified when something goes wrong (or good), so this would be having a bell go off before a flood is about to hit and the well will get too full.  Lastly, Predictive is where we are at, this would be identifying when a flood could hit and making adjustments before it does hit by releasing water past the well.  Of course having a predictive flood tool for your well without having a well, wouldn't do much good.

To dive deeper into what I feel Predictive data needs are, I want to highlight that there is a ton of work by data scientists today to build models, forecasts, and other tools all off of data.  The way I generalize this is that Predictive data needs are built to identify alerts before it happen allowing your client to take action before something happens. This could be forecasts of sales for the next 3 months.  This could provide suggestions on tests being performed and recommend action based on the test results.  Finally, this could predict when more traffic hits your website if you making the recommended adjustments.

Needless to say, predictive data needs have certainly transformed how we as a society behave on a daily basis.  Whether it comes from weather prediction models or it is how we interact on the internet, predictive data analytics is the pinnacle of data needs.  But in order to get to the pinnacle of data needs, you to have a good solid delivery / understanding of the previous layers to get there.