Monday, March 21, 2016

Building & Utilizing Data for the Future

As part of the fourth and final post in the series "Enabling Data Driven Organizations", the focus of this post is about how to make an "engine" of both data and people to make your organization data driven.

Now that you have the inventory of your data environment, you understand the value, and have influenced changed in how people work, its now time to have an engine that can run and support future needs of the organization.  What I mean by an engine, is by collecting the inventory and building a product around it, people throughout the organization now will have innovative ideas.  While making enhancements to your product will help the organization, part of being a good product is enabling others to build upon what you already have.  This changes the product to a platform.  

  • A Product is a unique tool with a specific design/purpose.  There are use cases and scope, users, and ultimately an end to its development lifecycle.
  • A Platform is a series of products and API's that not only enable the use cases and scope, but also allow others to utilize that information/content to expand into other use cases and purposes, far beyond the scope what the original product team may come up with.
To enable to this, you need to enable systematic communication both for receiving content and sending out content.  You may ask, why this is needed for both incoming and outgoing content.
  • With incoming content, others within the organization may want to provide additional information to the platform like SLA, data quality, comments, and more.  Allowing this to be systematic enables other products and other teams to ultimately work off of a single platform.
  • With outgoing content, others within the organization may want to consume inventory counts, relationships, and more to provide a custom / unique experience to their user base.

When speaking to the context of the data inventory, building that engine (or platform) which allows others within the organization to expand on data driven tools/products.  The analysts may build notifications on new data sets, the data engineers may build SLA reports on their data pipelines, the data scientists may build dynamic models/relationships, the system owners may build dynamic notifications on relationships between systems, and the data consumers may build metadata management automation

This is where products become platforms (the boys to the men) and what a single product team started morphs into what an entire organization utilizes and drives their day to day work on going forward.  This is where you not only influence but also enable the rest of the organization to be "data driven"

Monday, March 14, 2016

Influencing Change Using Data

As part of the third post in the series "Enabling Data Driven Organizations", the focus of this post is about how to influence change within your "data driven" organization.  In the previous posts in this series we focused on exposing the current inventory of data objects and then understanding the value that provides.  Now that we know the value of the inventory, we need to influence change in how the rest of the organization utilizes this information.

First, I want to focus on who within the organization can utilize the value of knowing the data inventory:

Data Engineer

On a daily basis, Data Engineers are responsible for the flow of data and "building data".  As part of this responsibility, one of the challenges they encounter is understanding where data is sourced from.  A Data Discovery Search and Data Lineage are 2 values that will change how they work.

Data Analyst

On a daily basis, Data Analysts are responsible for utilizing data to present via reporting or standard analysis.  They need to quickly identify if this analysis or report already exists.  A Data Discovery Search and simple inventory reports will change how they work.

Data Scientist

On a daily basis, Data Scientists are responsible for finding new ways to use data and provide complex analysis.  Like Data Engineers, a Data Discovery Search and Data Lineage are 2 values that will change how they work.

System Owners

On a daily basis, System Owners are responsible for maintaining a reliable system like a database or reporting system.  With that responsibility, they need to understand what is used (and not used) as well as impact analysis (downstream) to other systems.  Data Linage (downstream) and Inventory Reporting are 2 values that will change how they work.
Data Consumer

On a daily basis, many individuals in your company are data consumers.  They can be executives which review dashboards or your business analyst tracking their progress.  With companies having multiple systems, a Data Discovery Search across platforms will change how they work.

Now that we have identified who can benefit from exposed data inventory, the next question is how to influence changing how they work.  This can be approached in multiple ways and can be a touchy subject.  How I would suggest to approach this is:

  • Advertise Capabilities - One of the biggest barriers to change is being unaware.  Informed people within the organization often will utilize tools that make them more productive.  If they are more productive than their peer, this rivalry will drive efficiencies.  This can be done through written communication or presentations.
  • Celebrate Success Stories - When a product is built, it is often built with use cases in mind.  Whether these are planned use cases or not, the early adopters that find success should be celebrated and communicated.  Similar to advertising capabilities, this showcases to the rest of the organization the value that people are already utilizing the product.
  • Encourage Change - As you see opportunities of how work has "always been done", offer your wisdom, information, and data to help solve these problems.  This could be decommissioning a table and understanding the impact.  This could be cleaning up a system by understanding what is used vs. not.  Offering the information to help solve problems faster, word will spread and people will follow in the future. 
  • Other Communication Channels - Find the other forms of communication channels that catch attention of the people within the organization.  This could be leadership communication, mass email blasting, flyers, or whatever else that may effective.

This step in making a "data driven organization", will be a challenging one but one that can gain traction quickly if done right.  Identifying who can gain value and educating them through the process will influence the desired change.  

Sunday, March 6, 2016

Understanding the Value of Data Inventory

As part of the second post in the series "Enabling Data Driven Organizations", the focus of this post is about understanding the value of the data inventory.  While hinted in the previous post, the exposure of the data inventory does drive motivation to complete metadata, the value of this information has is so much more.

Once you have the four major object types (reports, data sources, data movement jobs, and business terms), there is so much value that this unlocks.  

  • Clean up Environment - being able to understand the entire inventory of reports, tables and more across multiple systems, this allows you to get a global picture and clean up unused, duplicated, or un-needed content.  This helps ease strain on systems as well as just good overall clean up.
  • Decrease Discovery Process - being able to search and discover tables for content that is new to you, will decrease the time to understand of where data is and increase speed to deliver.
  • Reduce Duplicate Work - being able to understand what exists already, will reduce re-creating an existing (or similar) reports, tables, and more.
  • Mosts Popular & Unused - being able to understand the most used (and least used) reports and tables and help direct new hires as well as drive consistency across teams.
  • Audits - being able to reduce time on audits / reports on the inventory, who has access to the content, who has ran the report / table, etc.
  • Lineage - being able to increase knowledge / trust on where the data came from upstream on a particular reports or table. 
  • Impact Analysis - just like lineage, being able to understand downstream impact from a particular table will help keep focus when issues arise with the table or when you want to decommission that table.
  • Team Activity Tracking - being able to track how much content is being created, modified, and deleted by teams is a create way to track the impact of the team.
All of these are great examples of the value of exposing the data inventory (and usage).  Having this information readily available to make decisions, discover content, increase knowledge, or audits, provides an amazing amount of value.  Whether it decreases time to find information or allows you to focus on the impact a particular table has, this allows you (and your teams) to be much more efficient and be truly "data driven".

Sunday, February 28, 2016

Data Exposure Drives Motivation

As part of the first post in the series "Enabling Data Driven Organizations", the focus of this post is about exposure of the data inventory.  To be a data driven organization, you not only need to know what data you need to run your business, but you also need to know what data is currently available, how to get it, and where it is being presented.  This is where the "data inventory" comes into play.

When trying to understand your data environment, you should focus on the following objects:

  • Reports - anything that presents data for consumption.  This can be reports, dashboards, data visualizations, excel files, and more.
  • Data Sources - anything that stores or creates data.  This can be tables, views, cubes, extracts, and more.
  • Data Movement (Jobs) - anything that moves, transforms, manipulates data.  This can be ETL jobs, Kafka producers, stored procedures, and more.
  • Business Terms - all definitions of key metrics or attributes to drive consistent understanding.

ReportData SourceData Movement

As mentioned in the series "Enabling Data Driven Organizations", having an inventory of all of these object types across all platforms within the organization enables a lot decisions. Before we can get to those decisions, this exposure is a critical step.  A public exposure of this content can be seen as airing the "dirty laundry" of reports, data sources, data movement, and business terms.  This in itself is very powerful as there are likely way too many reports, way too many tables, and more.  The mere public exposure of this inventory will motivate your company to manage its inventory.  This is an effective way to motivate teams to clean up and self manage.  This I find is much more effective than a top down governance mandate.

The exposure of this inventory can help drive the following:
  • Counts of Inventory - this allows you to know how many reports, data sources, and more are within the company.
  • Metadata Coverage - this allows you to know how much metadata (descriptions, owners, etc.) is available within your inventory.
  • Activity Progress - this allows you to show how much content is getting created (or deleted) over time which is a measure of your data team's activity.

Bottom line, like the levels of data needs, exposing the inventory can motivate teams, build a base understanding of content, and measure activity/impact of teams.  This is the first step in  "Enabling Data Driven Organizations".

Sunday, February 21, 2016

Enabling Data Driven Organizations

With the previous series that was focused specifically Data Needs and Making it Useful, this new series is taking a bigger look at "Enabling Data Driven Organizations."  With organizations often having several teams and technologies that deliver on data needs, as time goes on this builds a massive inventory of data deliverables.

With this massive inventory of deliverables, organizations have a treasure trove of information that is often right under their noses.  When focusing on data, organizations often focus on Sales, Traffic, Daily Visits, and other key performance metrics, the treasure trove of data that is often just as valuable is what is available within the organization that has already been built.  What reports, tables, data flows, and terms exists within the organization.

Why is this information so difficult to find?  

Like many organizations, this inventory is often separated across the different silos of teams as well as different technologies.  How many companies have a single database or a single reporting tool?  While many technologies claim to be "Enterprise BI" tools, the truth of the matter is that there is often not a single technology across the entire organization.  Understanding that this is impacted by both process and technology, this can be a tricky problem to solve.

In the following posts, I plan to focus on:
  • Exposure of Inventory - as we all know, getting access to the data is the first step of data needs. 
  • Understanding Value - after getting access to the inventory, understanding the value of this data is critical.
  • Influencing Change - after understanding the value of this data, focusing on changing how the organization approaches problems and utilizes data "on data" to make decisions.
  • Building for the Future - after the organization understands and utilizes this information, how can you make this engine solid for the future?

I am excited to share my experiences and where I feel this can "Enable Data Driven Organizations."

Sunday, January 31, 2016

Making it to Predictive Data Needs!!!

This is the final post in the series Data Needs & Making it Useful.  As mentioned in the previous posts, this series was all about how to understand the different levels of data needs and my thoughts about how each level relates and is dependent on the previous level.  These levels were:
  1. Data Pulls 
  2. Products
  3. Alerts
  4. Predictive

To focus more on Predictive, I wanted to first call out why I feel this is the top most level out of the data needs levels.  As mentioned in earlier posts, Data Pulls are the base layer because literally getting to the data should be a base need, it is like drinking water.  Products are the next level, you monitor usage and build products to make life easier, so instead of drinking water from the river 5 miles down, you build a well to have it available in town.   Alerts are the next level, you want to be notified when something goes wrong (or good), so this would be having a bell go off before a flood is about to hit and the well will get too full.  Lastly, Predictive is where we are at, this would be identifying when a flood could hit and making adjustments before it does hit by releasing water past the well.  Of course having a predictive flood tool for your well without having a well, wouldn't do much good.

To dive deeper into what I feel Predictive data needs are, I want to highlight that there is a ton of work by data scientists today to build models, forecasts, and other tools all off of data.  The way I generalize this is that Predictive data needs are built to identify alerts before it happen allowing your client to take action before something happens. This could be forecasts of sales for the next 3 months.  This could provide suggestions on tests being performed and recommend action based on the test results.  Finally, this could predict when more traffic hits your website if you making the recommended adjustments.

Needless to say, predictive data needs have certainly transformed how we as a society behave on a daily basis.  Whether it comes from weather prediction models or it is how we interact on the internet, predictive data analytics is the pinnacle of data needs.  But in order to get to the pinnacle of data needs, you to have a good solid delivery / understanding of the previous layers to get there.


Tuesday, January 26, 2016

Monday Makeover Challenge (Does America Save Enough?)

Hi All

As part of VizWiz's Monday Makeover Challenge, I have put together my version on the the viz (a day late - don't judge).

In this week's challenge, it was about does America save enough?

Because of the limited data and the dimensions available (age group and savings amount), there was one age group that stuck out to me "Overall".  I looked at this age group as the average of the all age groups.  If I looked at it by comparing the overall vs each age group, I could see which is doing better versus worse.

That was the "Savings Compared to Overall" section.  While this looked great and I was able to put the overall reference line on it and color code as necessary if above or below "overall", I couldn't get a sense of what the "overall score is"?  That is where I added the section on the left to line up and tell me what the overall American really has in savings.

Finally, I annotated the findings and called out a few key points and questions to spur the consumer of the viz.

You can find the viz at Tableau Public by

I hope you enjoy it!

Friday, January 22, 2016

Actionable Alerts!

In my previous post from the series Data Needs and Making it Useful, I wrote about how to make data driven products and it’s dependency on the first level of data needs, which are data pulls.  In this post, I am going to focus on the third level of data needs, which are alerts. 

I define alerts as delivering data as exceptions.  Unlike the data pulls and products, alerts are very focused delivery points that are based on specific thresholds.  You may not get an alert or exception every day or week.  Instead, you will get an alert or exception when sales drops below 5% of plan or website usage goes outside a 2 standard deviation of the expected trend.  Alerts require immediate attention and drive immediate action.  Alerts should say “LOOK AT ME!”

Alerts are a great tool to focus your client’s attention.  Instead of having your client go and request or look for the data to monitor, alerts optimize time by saying you only need to pay attention to when I notify you.  This saves time and effort from your client digging through data and reports to find information.

In order to be successful with alerts, there are the following criteria.  (The original assumptions from before still apply meaning the previous two levels of data needs have to be already delivered with data pulls & products.)

  •       Alerts need valid thresholds and rules to determine an alert.  If these thresholds are too tight, your clients will receive too many alerts and ultimately become numb to them.  If your iPhone has more than 30 notifications, do you look at them?  If your inbox has over 500 emails, do you really read all of them?  At the same point, if the thresholds are too loose, then your clients won’t receive the notifications to take action when truly needed.
  •      The delivery method for the alerts is critical.  This requires an understanding of what your client is accustomed to and what kind of accessibility do they have.  If you deliver an alert through email, but your client receives over 500 emails a day, will this get lost in the mix?  My suggested options to consider for alerting are mobile notifications, text messages, phone calls.  These all scream, “PAY ATTENTION TO ME NOW.”  Email and posts on internal social sites or products are also options, but may not get the same attention.
  •       Can I do something with the alert? Alerts are something to be very specific about when building.  Is this metric something that I need to alert clients on and can they do something about it?  For example, if sales last month miss the threshold, is there something I can do take action on to correct it?  If the month already happened, then there probably isn’t much I can do to change it.

Bottom line, alerts are a powerful tool to notify when something needs attention.  With the sea of data and information out there today, your clients need to focus on what is most important for the business and use data and alerts to take action when necessary.  Alerts optimize time and effort effectively based on what the data is providing.  This allows your clients to focus on their primary responsibility instead of sifting through data.

Sunday, January 17, 2016

Data Driven Products

In my previous post (Data Needs & Making it Useful), I talked about 4 levels of data needs and how each level serves a particular purpose.  Here, I want to dive deeper into the second level, Products.

(Important Note) When talking about each of these data needs "levels", I want to make it clear that in order to be successful you need from the first level (data pulls) and continuously deliver each level after it (Products, Alerts, & Predictive).  Any gaps within these levels, can cause your clients to have an unintended experience.  For example, if you start by building out a bunch of products but you don't give the flexibility to have data pulls, you will be stuck building hundreds of products to match to every request and every new metric.  Rather if you start with data pulls, you respond quicker by adding the new metric and ensuring this is good quality and was built for what it was intended for before you build a product around it.

Now, lets dive into Products.

First, I define products as any tool, report, or presentation of data to solve a specific use case.  This could include building a weekly sales report, this could be an A/B test tool, this could be an executive dashboard.  Whatever the product may be, these products are being built for a specific reason and ultimately making life easier than if data pulls were just delivered.

In my experience, I have seen too many times that products are built as the first deliverable.  Long hours are spent gather requirements, building out data structures, and building out these detailed products.  This is often done because of how organizations are structured (i.e. product/project teams), because the data teams are focused on delivering the "home run" of I will solve your problem with this amazing product, etc.  While the products are delivered, if the data pulls are not available, the clients still turn out unsatisfied over time.  Requirements evolve and data needs change which often causes product teams to go back and work off of a long list of enhancements.  This then causes either the original product to over engineered to a point where it can become unusable or additional products are built which causes the client to now to go two or more products to answer their current question.

Below is a typical timeline of how the product cycle happens without a data pull environment.

  1. Product teams are built
  2. Requirements are built for a new product from the client requests
  3. Data is gathered and a product is built
  4. Product is delivered (the client is happy!)
  5. The client asks for enhancements (new question, new data needs, or discovery question)
  6. Product teams work off a long enhancement list
  7. New products are built or the existing product is over engineered causing the client to spend more time to solve their question

Now how would this product level look if a proper data pull environment was available?

  1. Teams analyze current usage of data from the data pull and suggest a product to optimize the process to answer the question
  2. Product teams are built
  3. Requirements from both the client AND current data usage make up a new product
  4. Data is gathered (utilizing existing data pulls) and a product is built
  5. Product is delivered (the client is happy!)
  6. The client asks for enhancements (new question, new data needs, or discovery question)
  7. New data is built and made available through the data pull level first
  8. Depending on if usage is consistent from new data or question, then a product is built or enhanced

Bottom line, with the previous data need level (data pulls), this allows your team to make data driven decisions on where to focus.  Spending time to build out data and make it available for a data pull is a better use of time than building out product teams and making a product that may not get used.  Instead build the new metric or data set out and allow your clients to pull the data manually.  Have them prove out the use first before building an elaborate product to "optimize time".  Then your data team can focus on the right problem, the right product, and deliver the right value.

Sunday, January 10, 2016

Enabling Foundational Data Needs

In my previous post (Data Needs & Making it Useful), I talked about 4 levels of data needs and how each level serves a particular purpose.  Here, I want to dive deeper into the first level, Data Pulls.

(Assumption) With any area focused on data to make decisions, you first need data to capture and track.  I am making the assumption that this is already completed.  In today's world, there petabytes among petabytes of data.  From transactional data to messaging / conversation history, there as a large availability of data.  The key is ensuring you are capturing this data and making it available to your analysts.

When making this data available to your analysts, there is a foundational need of getting to the data.

This need is what I classify as "Data Pulls".  As mentioned before, this is a dump of the raw data.  This is could be an excel file with every single field and row of data.  For example, I could ask for a data pull of sales.  Below is an example of the type of data requested

Fields Field Description
Sales Date The date/time of the transaction
Customer ID The id of the customer
Customer Name The name of the customer
Transaction Number The id of the activity
Payment Type The type of payment of the transaction
Transaction Type The type of transaction made
Item Order The position of item in the transaction
Item Name The name of the item
Item Number The id of the item
Item Description The description of the item
Item Category The grouping of similar items
Location The location of the transaction
Location Category The grouping of the location
Item Quantity The number of items in the transaction
Base Price The standard price of an item
Discount Discounts taken off of the item

This data could be pulled by a variety of ways but you need to know who your audience is and what their capabilities are.  Can they write SQL to pull the data themselves?  What types of tools are they used to working with (Excel, Reporting Tools, etc.)?  Do you want to continually supply this data to them manually?  If you can automate it, what happens if they want a new field to be added?

If you cannot simply provide a table for someone to write a query against, my suggested approach is to provide a "Self Service" tool.  Self service has long been a hot button topic for many companies.  To approach self service, I look at it in two fashions.  You need to implement both to be successful as each type of self service tool plays a distinct role.

  1. Wide Coverage & Fixed Granularity
    • This type of self service tool allows you to span across multiple subject areas at a fixed level of data.  An example of this may be looking at inventory, sales, transactions, customers and more all at a single day level.  This allows you to get a wide view and compare metrics across subject areas. 
  2. Deep Dive Subject Area
    • This type of self service tool allows you to "go deep" into a particular subject area.  If you are interested in sales, you can drill into very specific details such as transaction types, items, and more.

With both of these types of "Data Pull" tools implemented, this allows your customers to fill the base need of "getting to the data" while allowing the flexibility for them to adjust their data requests.  With self service tools like these built, this allows teams responsible for the data to free up their time to move to the next level of data needs "Products".