Home » Business » Three Types of Data Science Questions

Three Types of Data Science Questions

On the blog Signal vs. Noise by the guys at 37signals, Noah Lorang gave is an interesting synopsis of what data science is (or in this instance, Noah refers to it as “business analytics”):

At its foundation, business analytics is about converting a rather broad question (like “why do people cancel?”) into a set of specific questions that you can answer with data.”

I would say that this is true to a point, and it got me thinking about what data science and the development of analytics are really about in my experience.  Ultimately, I see data science and the development of analytics about answering questions with data.  Where the importance of these fields come into play is based upon how hard the question is that you’re trying to answer, and that determines what you need to know in order to answer the question.

To get started in making data science effective, there are a lot of easy techniques that can be applied and can provide quality results quickly.  However, I’ve seen other companies I’ve worked with have significant difficulties when the questions gets harder, because quick solutions don’t cut it, and these companies don’t apply a more disciplined approach.

From my perspective, there are three areas of questions, governing how successful one can be in using data to answer them.  The first is…

The “Easy”:  The post above clearly focuses on the “easy” data science problems.  If the question you’re asking can be easily answered by the data, then nearly any analytics, toolset, visualization technique or textbook solution that you come up with, even just simple arithmetic, will get you a good answer.  There are multitude of textbooks on statistics and machine learning that can get you an answer, and for the “easy” problems, they will all succeed to your satisfaction.  The tough part is when your problem broaches into the next area…

The “Hard”:  These are the areas where you actually have enough data to answer your question effectively (even if you don’t know it…), but you must take care about how you generate your analytics.  There is a science to creating the right analytics on which we base decisions, and understanding this comprehensively can let us answer these “hard” problems quite well. 

However, blindly throwing textbook solutions at “hard” problems will not lead you anywhere.  I’ve seen time and time again where a quick solution gave a great initial result, and then this same group floundered for years to improve upon their own work.  This is the world where the disciplines of data science and analytics engineering can be brought to bear.  So, that leaves the third area, which is…

The “Nearly Impossible”:  There are just some questions that we can’t answer well because we don’t have enough data.  Now, I do say “nearly” impossible, because with any statistical situation, there is always some very, very rare instances where the data will be “just right” to let you find what you’re looking for by using a simple technique.  The point is that you can’t count on it.

If the "nearly impossible" questions could be answered this easily...

I keep thinking about finding a needle in a haystack – the classic example of the “nearly impossible” problem.  In general, you’ll never find it is you don’t go about it systematically and with discipline.  However, on a sunny day, where you’re positioned just right, and the sun is shining at just the right angle, you might just get the rare metallic glimmer of that needle among all those strands of hay.  You certainly might see it then, but the heavens have to align just so.  

In the “nearly impossible” situations, you can’t develop a reliable analytics solution – you’re going to need more data.  The disciplines of data science, however, can tell you which area (easy, hard, or nearly impossible) you’re in.


9 Responses

  1. [...] necessary to even get at the easy problems that can be solved with mountains of data.  However, as the problems get harder, there will come a need for engineering the right analytics.  Data science and a rigourous [...]

  2. [...] will be the intelligent analysis of all this data, not merely crude analysis done really fast.  As the questions we try to answer become harder, we’ll need a real engineering discipline in developing the right analytics, not merely [...]

  3. [...] It turns out that the natural evolution of analytics is to go from “slow” problems to “fast” problems, turning the inquisitive understanding of the data, requiring analysis, into faster number-crunching analytics.  Knowing the right way to generate these “fast” analytics requires an solid analytics engineering discipline, especially when the problems being answered get harder and harder.  [...]

  4. [...] model works fine if what you want is something quick, which would then be cheap.  For many of the easy data science problems, this would work well enough, since textbook solutions can be implemented easily enough by smart [...]

  5. [...] the right questions (and some questions are easier to answer with data than others…), we can make the data tell us what we want to [...]

  6. [...] now have lots of data to work with doesn’t mean that we will now get better decisions.  How we turn data into actional information – the methods, the tools, the techniques – are incredibly important.  Also, as Shah [...]

  7. [...] is being done by companies to create data visualization tools to gain insight from the data, but as the problems get much harder, better analytics approaches will need to be brought to bear.  The real key over the next few [...]

  8. [...] if done right.  Saving lives and reducing costs dramatically in healthcare would qualify as one of those hard problems where disciplined approaches can yield significant results.  Here is Hill’s post on [...]

  9. [...] work quite well.  This is likely a snapshot of where things are today, but I do believe that as the questions we ask of the data get more complicated, we will clearly see the need for a more rigorous science-based discipline to data [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2011 Mic Farris. All rights reserved.

Bad Behavior has blocked 241 access attempts in the last 7 days.