Sunday, January 31, 2016

Making it to Predictive Data Needs!!!

This is the final post in the series Data Needs & Making it Useful.  As mentioned in the previous posts, this series was all about how to understand the different levels of data needs and my thoughts about how each level relates and is dependent on the previous level.  These levels were:
  1. Data Pulls 
  2. Products
  3. Alerts
  4. Predictive

To focus more on Predictive, I wanted to first call out why I feel this is the top most level out of the data needs levels.  As mentioned in earlier posts, Data Pulls are the base layer because literally getting to the data should be a base need, it is like drinking water.  Products are the next level, you monitor usage and build products to make life easier, so instead of drinking water from the river 5 miles down, you build a well to have it available in town.   Alerts are the next level, you want to be notified when something goes wrong (or good), so this would be having a bell go off before a flood is about to hit and the well will get too full.  Lastly, Predictive is where we are at, this would be identifying when a flood could hit and making adjustments before it does hit by releasing water past the well.  Of course having a predictive flood tool for your well without having a well, wouldn't do much good.

To dive deeper into what I feel Predictive data needs are, I want to highlight that there is a ton of work by data scientists today to build models, forecasts, and other tools all off of data.  The way I generalize this is that Predictive data needs are built to identify alerts before it happen allowing your client to take action before something happens. This could be forecasts of sales for the next 3 months.  This could provide suggestions on tests being performed and recommend action based on the test results.  Finally, this could predict when more traffic hits your website if you making the recommended adjustments.

Needless to say, predictive data needs have certainly transformed how we as a society behave on a daily basis.  Whether it comes from weather prediction models or it is how we interact on the internet, predictive data analytics is the pinnacle of data needs.  But in order to get to the pinnacle of data needs, you to have a good solid delivery / understanding of the previous layers to get there.


Tuesday, January 26, 2016

Monday Makeover Challenge (Does America Save Enough?)

Hi All

As part of VizWiz's Monday Makeover Challenge, I have put together my version on the the viz (a day late - don't judge).

In this week's challenge, it was about does America save enough?

Because of the limited data and the dimensions available (age group and savings amount), there was one age group that stuck out to me "Overall".  I looked at this age group as the average of the all age groups.  If I looked at it by comparing the overall vs each age group, I could see which is doing better versus worse.

That was the "Savings Compared to Overall" section.  While this looked great and I was able to put the overall reference line on it and color code as necessary if above or below "overall", I couldn't get a sense of what the "overall score is"?  That is where I added the section on the left to line up and tell me what the overall American really has in savings.

Finally, I annotated the findings and called out a few key points and questions to spur the consumer of the viz.

You can find the viz at Tableau Public by

I hope you enjoy it!

Friday, January 22, 2016

Actionable Alerts!

In my previous post from the series Data Needs and Making it Useful, I wrote about how to make data driven products and it’s dependency on the first level of data needs, which are data pulls.  In this post, I am going to focus on the third level of data needs, which are alerts. 

I define alerts as delivering data as exceptions.  Unlike the data pulls and products, alerts are very focused delivery points that are based on specific thresholds.  You may not get an alert or exception every day or week.  Instead, you will get an alert or exception when sales drops below 5% of plan or website usage goes outside a 2 standard deviation of the expected trend.  Alerts require immediate attention and drive immediate action.  Alerts should say “LOOK AT ME!”

Alerts are a great tool to focus your client’s attention.  Instead of having your client go and request or look for the data to monitor, alerts optimize time by saying you only need to pay attention to when I notify you.  This saves time and effort from your client digging through data and reports to find information.

In order to be successful with alerts, there are the following criteria.  (The original assumptions from before still apply meaning the previous two levels of data needs have to be already delivered with data pulls & products.)

  •       Alerts need valid thresholds and rules to determine an alert.  If these thresholds are too tight, your clients will receive too many alerts and ultimately become numb to them.  If your iPhone has more than 30 notifications, do you look at them?  If your inbox has over 500 emails, do you really read all of them?  At the same point, if the thresholds are too loose, then your clients won’t receive the notifications to take action when truly needed.
  •      The delivery method for the alerts is critical.  This requires an understanding of what your client is accustomed to and what kind of accessibility do they have.  If you deliver an alert through email, but your client receives over 500 emails a day, will this get lost in the mix?  My suggested options to consider for alerting are mobile notifications, text messages, phone calls.  These all scream, “PAY ATTENTION TO ME NOW.”  Email and posts on internal social sites or products are also options, but may not get the same attention.
  •       Can I do something with the alert? Alerts are something to be very specific about when building.  Is this metric something that I need to alert clients on and can they do something about it?  For example, if sales last month miss the threshold, is there something I can do take action on to correct it?  If the month already happened, then there probably isn’t much I can do to change it.

Bottom line, alerts are a powerful tool to notify when something needs attention.  With the sea of data and information out there today, your clients need to focus on what is most important for the business and use data and alerts to take action when necessary.  Alerts optimize time and effort effectively based on what the data is providing.  This allows your clients to focus on their primary responsibility instead of sifting through data.

Sunday, January 17, 2016

Data Driven Products

In my previous post (Data Needs & Making it Useful), I talked about 4 levels of data needs and how each level serves a particular purpose.  Here, I want to dive deeper into the second level, Products.

(Important Note) When talking about each of these data needs "levels", I want to make it clear that in order to be successful you need from the first level (data pulls) and continuously deliver each level after it (Products, Alerts, & Predictive).  Any gaps within these levels, can cause your clients to have an unintended experience.  For example, if you start by building out a bunch of products but you don't give the flexibility to have data pulls, you will be stuck building hundreds of products to match to every request and every new metric.  Rather if you start with data pulls, you respond quicker by adding the new metric and ensuring this is good quality and was built for what it was intended for before you build a product around it.

Now, lets dive into Products.

First, I define products as any tool, report, or presentation of data to solve a specific use case.  This could include building a weekly sales report, this could be an A/B test tool, this could be an executive dashboard.  Whatever the product may be, these products are being built for a specific reason and ultimately making life easier than if data pulls were just delivered.

In my experience, I have seen too many times that products are built as the first deliverable.  Long hours are spent gather requirements, building out data structures, and building out these detailed products.  This is often done because of how organizations are structured (i.e. product/project teams), because the data teams are focused on delivering the "home run" of I will solve your problem with this amazing product, etc.  While the products are delivered, if the data pulls are not available, the clients still turn out unsatisfied over time.  Requirements evolve and data needs change which often causes product teams to go back and work off of a long list of enhancements.  This then causes either the original product to over engineered to a point where it can become unusable or additional products are built which causes the client to now to go two or more products to answer their current question.

Below is a typical timeline of how the product cycle happens without a data pull environment.

  1. Product teams are built
  2. Requirements are built for a new product from the client requests
  3. Data is gathered and a product is built
  4. Product is delivered (the client is happy!)
  5. The client asks for enhancements (new question, new data needs, or discovery question)
  6. Product teams work off a long enhancement list
  7. New products are built or the existing product is over engineered causing the client to spend more time to solve their question

Now how would this product level look if a proper data pull environment was available?

  1. Teams analyze current usage of data from the data pull and suggest a product to optimize the process to answer the question
  2. Product teams are built
  3. Requirements from both the client AND current data usage make up a new product
  4. Data is gathered (utilizing existing data pulls) and a product is built
  5. Product is delivered (the client is happy!)
  6. The client asks for enhancements (new question, new data needs, or discovery question)
  7. New data is built and made available through the data pull level first
  8. Depending on if usage is consistent from new data or question, then a product is built or enhanced

Bottom line, with the previous data need level (data pulls), this allows your team to make data driven decisions on where to focus.  Spending time to build out data and make it available for a data pull is a better use of time than building out product teams and making a product that may not get used.  Instead build the new metric or data set out and allow your clients to pull the data manually.  Have them prove out the use first before building an elaborate product to "optimize time".  Then your data team can focus on the right problem, the right product, and deliver the right value.

Sunday, January 10, 2016

Enabling Foundational Data Needs

In my previous post (Data Needs & Making it Useful), I talked about 4 levels of data needs and how each level serves a particular purpose.  Here, I want to dive deeper into the first level, Data Pulls.

(Assumption) With any area focused on data to make decisions, you first need data to capture and track.  I am making the assumption that this is already completed.  In today's world, there petabytes among petabytes of data.  From transactional data to messaging / conversation history, there as a large availability of data.  The key is ensuring you are capturing this data and making it available to your analysts.

When making this data available to your analysts, there is a foundational need of getting to the data.

This need is what I classify as "Data Pulls".  As mentioned before, this is a dump of the raw data.  This is could be an excel file with every single field and row of data.  For example, I could ask for a data pull of sales.  Below is an example of the type of data requested

Fields Field Description
Sales Date The date/time of the transaction
Customer ID The id of the customer
Customer Name The name of the customer
Transaction Number The id of the activity
Payment Type The type of payment of the transaction
Transaction Type The type of transaction made
Item Order The position of item in the transaction
Item Name The name of the item
Item Number The id of the item
Item Description The description of the item
Item Category The grouping of similar items
Location The location of the transaction
Location Category The grouping of the location
Item Quantity The number of items in the transaction
Base Price The standard price of an item
Discount Discounts taken off of the item

This data could be pulled by a variety of ways but you need to know who your audience is and what their capabilities are.  Can they write SQL to pull the data themselves?  What types of tools are they used to working with (Excel, Reporting Tools, etc.)?  Do you want to continually supply this data to them manually?  If you can automate it, what happens if they want a new field to be added?

If you cannot simply provide a table for someone to write a query against, my suggested approach is to provide a "Self Service" tool.  Self service has long been a hot button topic for many companies.  To approach self service, I look at it in two fashions.  You need to implement both to be successful as each type of self service tool plays a distinct role.

  1. Wide Coverage & Fixed Granularity
    • This type of self service tool allows you to span across multiple subject areas at a fixed level of data.  An example of this may be looking at inventory, sales, transactions, customers and more all at a single day level.  This allows you to get a wide view and compare metrics across subject areas. 
  2. Deep Dive Subject Area
    • This type of self service tool allows you to "go deep" into a particular subject area.  If you are interested in sales, you can drill into very specific details such as transaction types, items, and more.

With both of these types of "Data Pull" tools implemented, this allows your customers to fill the base need of "getting to the data" while allowing the flexibility for them to adjust their data requests.  With self service tools like these built, this allows teams responsible for the data to free up their time to move to the next level of data needs "Products".

Monday, January 4, 2016

Monday Makeover Challenge

I am a frequent follower of Andy's Monday Makeover challenge and love to see what new and interesting data sets are presented every week.  That being said, I am taking up Andy's challenge.  Here is my take on the Bryce Harper Valuation from fivethirtyeight.

Most Bang for the Buck

When I think about valuation, I started to take a different take on the data.  What is the "most bang for the buck" that the top valuation makers?  Essentially, I wanted to analyze from how much salary they are getting paid, what is the valuation they have derived?

When looking at this presentation, Bryce Harper's 2015 year may not be the "most bang for the buck."  Instead Mike Trout's 2013 year was much more valuable.  For those taking a look like Mr. Money Ball "Billy Bean", getting the most out of those rookies early may be the best way to get the most "value" going forward.

Sunday, January 3, 2016

Data Needs & Making it Useful

Throughout my time and experience with multiple companies, I have come to find out that in order to be data-driven, there are several layers of data needs to be successful.

Data Needs

As shown above, there are 4 key areas to data needs.

  • Data Pulls
    • This is the base level to any data needs.  If you don't have core data and are not storing it, then you cannot have data pulls. Data pulls I categorize as literally a dump of the raw data.  This is often shared in spreadsheets and text files.  This is where there are files with thousands of rows with a wide list of columns.    These data pulls are critical and often many people fall back to this to answer specific questions.  If a self-service tool can be built to simplify the data pulls, this layer is something that can be delivered to the masses.
  • Products
    • I often categorize products as a specific delivery of a data question.  This is often how Business Intelligence teams function.  There is often where teams meet with business partners, identify and document requirements, build a specific dashboard/report based on that needs.  An example would be where a team would deliver a Sales Analysis report to the director of Sales.  Often there are reports (aka products) that are grouped together to deliver on the specific set of business questions.
  • Alerts
    • This is the next level of deliverables.  Instead of having your business partner to pull their own data or deliver on a specific question, eventually there are too many "products" to focus on.  This is where alerts come into play.  Instead of focusing on so many individual products, now alerts can notify when something is good or bad.  This helps optimize time for those who use the data.  Instead of looking at all the "products" or all the data, they can rather focus on areas that are outperforming.  An example could be to show what departments are selling at less then 2 standard deviations from the average sales for the past month.  Now instead of looking at over 100 departments, you may get alerted on 2 departments.
  • Predictive
    • The last level of data needs is focused on predictive.  This is an expansion on the "alerts" section.  This allows the data consumer to be notified prior to bad or good alerts.  These predictive products can be based on models from prior history or even other types of models. This changes the behavior from being reactive (focused on the prior 3 levels) to being proactive and being data driven.

These levels and deliverables can be applied to any industry.  Whether this is sales, web analytics, inventory, or another type of data these types of data needs are critical in order to be data driven.

Hopefully you find this helpful.  Please feel free to leave comments and thoughts below.