How Much "Data Science" Do You Really Need?
Is it really true that "Nearly two-thirds of big data projects will fail to get beyond the pilot and experimentation phase in the next two years, and will end up being abandoned," as suggested by Steve Ranger last year? My take: to be successful you need a collaborative team with multiple skills, effective leadership, good communication -- and a plan. In other words, don't put the cart before the horse by starting with a technical solution before you understand what problems you'll be trying to solve.
How much analytical expertise is needed?
In my consulting and research I help people plan and manage effective and sustainable data and IT management programs. Given how "big data" has risen in importance I'm very interested now in how you plan and govern programs that manage and provide access to organized and useful data.
The question often arises of how much analytical expertise is needed to effectively manage data-intensive programs, regardless of whether new or existing data management and analysis technologies will be employed. While "big data" has made such issues quite visible, you can't expect everyone to be a "data scientist" or professional statistician. At the same time, managers who expect to drive or make sense of improved data analytics are going to need more than a understanding of how spreadsheets operate.
Marketing function as an example
Training and educational programs around data analysis tools and techniques are proliferating. Managers in different fields wonder how much analytical expertise is needed. A recent example is Adele Sweetwood's Creative or Analytical? Marketers Must Now Be Both. In it she says,
"Knowledge of data-management principles and analytical strategies, an understanding of the importance of data quality and data governance, and a solid grasp of the value of data in marketing disciplines are now all essential."
"Today's marketer needs to go well beyond reporting and metrics."
"The successful contributor is proficient in a full range of analytics, which may include optimization, text, sentiment, scoring, modeling, visualization, forecasting, and attribution. That doesn't necessarily mean all marketers must have a PhD in statistics, but they must understand and use such methods."
Sweetwood's overview surprised me a bit. Having managed a lot of quantitative market research early in my career I have always tended to think that marketing is, at its core, a data-dependent function where measuring and predicting customer engagement and/or behavior have always been paramount. As Sweetwood points out, though, the tools now available to the marketer are more sophisticated, complex, and they operate increasingly in real time, i.e., "Today's marketer needs to go well beyond reporting and metrics." Yesterday's number-crunching skills might not be up to the task.
The greatest inefficiencies
Yet, even if we do have the right skills for data prep and data analysis on board, there's still a danger that "data scientists" might fall into the trap of "solving the wrong problem," as discussed so eloquently by Claudia Perlich in her Quora response to the question What are the greatest inefficiencies data scientists face today?:
"The by far biggest issue I see is data science solving irrelevant problems. This is a huge waste of time and energy. The reason is typically that whoever has the problem is lacking data science understanding to even express the issue and data scientists end up solving whatever they understood might be be the problem, ultimately creating a solution that is not really helpful (and often far too complicated)."
I don't think this is such an unusual challenge -- the need to address the traditional "gulf" between managers and specialized technical solution people -- but it should not come as a surprise. We know from years of experience in project management that successful projects require communication and understanding among project team members and influential stakeholders so that the right questions are being addressed and so everyone is moving in the same direction.
A team approach
Bob Hayes in Investigating Data Scientists, their Skills and Team Makeup again underscores the need for a team approach to making sure data science applications are successful:
It appears that a team approach is an an effective way of approaching your data science projects. Solving problems using data (e.g., a data-driven approach) involves three major tasks: 1) identifying the right questions, 2) getting access to the right data and 3) analyzing the data to provide the answers. Each major task requires expertise in the different skills, often requiring a team approach. Different data professionals bring their unique and complementary skills to bear on each of the three phases of data intensive projects.
My takeaway from the above, given my own focus on planning and managing effective data programs, is that a "tools first" approach will, more likely than not, fail.
Solving the right problems
We need to make sure that our "data science" team is focusing its energy -- and technologies -- on solving the right problems. That means understanding and working with management to accurately assess the "pain points" being experienced by the organization. We also need to understand and plan for the ongoing resource requirements of managing the team that reliably addresses these pain points.
Basic management understanding
This is not just a "business understanding" but a "project management" problem as well, as I suggested last year in You Need a Project Manager on Your Big Data Team. As I've discovered in my own consulting and research, though, the issue goes deeper than that and really does require management that has more than just a cursory understanding of the basic principles of data analysis.
These are a few of the "basics" that I think management needs to understand if it wants to effectively work with -- and within -- its data science team:
The costs and benefits associated with analyzing dirty vs. clean data.
The different analytical approaches to analyzing structured and unstructured data.
The analytical and data processing implications of using sample based data versus data that represent an entire population.
The importance of understanding the differences between "actual" versus "predicted" values.
The danger of going after "low hanging fruit" and not addressing the really important questions.
The limitations of past-oriented metrics and reporting.
When and where data modeling is most useful.
The importance of standards and metadata governance.
As someone who has traditionally tried to keep his feet in both the "business" and "technical" camps, I can personally attest that keeping up on all this isn't easy, especially when you're already running an organization. That's why the team approach is so important to addressing analysis questions.
Copyright (c) 2016 by Dennis D. McDonald