Three Types of Data Science Questions

On the blog Signal vs. Noise by the guys at 37signals, Noah Lorang gave is an interesting synopsis of what data science is (or in this instance, Noah refers to it as “business analytics”):

At its foundation, business analytics is about converting a rather broad question (like “why do people cancel?”) into a set of specific questions that you can answer with data.”

I would say that this is true to a point, and it got me thinking about what data science and the development of analytics are really about in my experience.  Ultimately, I see data science and the development of analytics about answering questions with data.  Where the importance of these fields come into play is based upon how hard the question is that you’re trying to answer, and that determines what you need to know in order to answer the question.

To get started in making data science effective, there are a lot of easy techniques that can be applied and can provide quality results quickly.  However, I’ve seen other companies I’ve worked with have significant difficulties when the questions gets harder, because quick solutions don’t cut it, and these companies don’t apply a more disciplined approach.

From my perspective, there are three areas of questions, governing how successful one can be in using data to answer them.  The first is…

The “Easy”:  The post above clearly focuses on the “easy” data science problems.  If the question you’re asking can be easily answered by the data, then nearly any analytics, toolset, visualization technique or textbook solution that you come up with, even just simple arithmetic, will get you a good answer.  There are multitude of textbooks on statistics and machine learning that can get you an answer, and for the “easy” problems, they will all succeed to your satisfaction.  The tough part is when your problem broaches into the next area…

The “Hard”:  These are the areas where you actually have enough data to answer your question effectively (even if you don’t know it…), but you must take care about how you generate your analytics.  There is a science to creating the right analytics on which we base decisions, and understanding this comprehensively can let us answer these “hard” problems quite well. 

However, blindly throwing textbook solutions at “hard” problems will not lead you anywhere.  I’ve seen time and time again where a quick solution gave a great initial result, and then this same group floundered for years to improve upon their own work.  This is the world where the disciplines of data science and analytics engineering can be brought to bear.  So, that leaves the third area, which is…

The “Nearly Impossible”:  There are just some questions that we can’t answer well because we don’t have enough data.  Now, I do say “nearly” impossible, because with any statistical situation, there is always some very, very rare instances where the data will be “just right” to let you find what you’re looking for by using a simple technique.  The point is that you can’t count on it.

If the "nearly impossible" questions could be answered this easily...

I keep thinking about finding a needle in a haystack – the classic example of the “nearly impossible” problem.  In general, you’ll never find it is you don’t go about it systematically and with discipline.  However, on a sunny day, where you’re positioned just right, and the sun is shining at just the right angle, you might just get the rare metallic glimmer of that needle among all those strands of hay.  You certainly might see it then, but the heavens have to align just so.  

In the “nearly impossible” situations, you can’t develop a reliable analytics solution – you’re going to need more data.  The disciplines of data science, however, can tell you which area (easy, hard, or nearly impossible) you’re in.

I serve as Director of Data Science & Analytics Engineering at Areté Associates. I've also served in leadership positions with Elanix, Inc. (now Agilent Technologies) and Mentor Graphics. I live in Thousand Oaks, CA, where I've served the public as Chair of our city's Planning Commission and our county's Tobacco Settlement Allocation Committee. I have been married to my wife Stephanie since 1993, and we have a wonderful daugther Monroe. Learn more about me »

Please note: I reserve the right to delete comments that are offensive or off-topic.