Data Science Defined

So, what is data science?  There is an increasing buzz about it lately – A Fortune article recently dubbed the “data scientist” as “The Hot New Gig in Tech”.

This report is a good primer for the new field of data science.  In some ways (as with many fields), it’s not new at all.  In fact, I think my company has been involved with raising the bar of data science within the defense and intelligence worlds for over 35 years – I’ve personally been involved for nearly 20 of those.  However, with the volumes of data being generated, and the ease of being able to process this data, there is now a recognition of the value of data science.

Another recent post by DJ Patil, chief scientist at LinkedIn, discusses how he built the data science team there, and he has a lengthy discussion on what the roles of data scientists are and what to look for in building your teams. 

In fact, in the O’Reilly data science report, Patil was referenced as believing that the best data scientists:

“tend to be “hard scientists”, particularly physicists, rather than computer science majors.  Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data.  They have to think about the big picture, the big problem.”

That has been my experience as well – the best data scientists that we find are ones that have a strong math background, whether they are majors in physics or in computer science.  Strict coders find data science a bit challenging if they don’t come armed with the strong math skills needed to pull information from the data…

Well, here’s my definition.  “Data science” is the general analysis of the creation of data.  This means the comprehensive understanding of where data comes from, what data represents, and how to turn data into actionable information (something upon which we can base decisions).  This encompasses statistics, hypothesis testing, predictive modeling, and understanding the effects of performing computations on data, among other things.  Science in general has been armed with many of these tools, but data science pools the necessary tools together to provide a scientific discipline to the analysis and productizing of data.

A complementary term that is being used currently is “analytics”, and I found an interesting and very appropriate definition on Wikipedia, which is “the process of obtaining an optimal or realistic decision based on existing data.”  Analytics are then useful metrics derived from data upon which decisions are made.

Combining data science and analytics together gives a foundation upon which new computing paradigms and data products can be generated.  Cool stuff!…

I currently serve as Director in the Advanced Risk & Compliance Analytics (ARCA) practice at PricewaterhouseCoopers (PwC). I've served as Director of Data Science & Analytics Engineering at Areté Associates and in leadership positions with Elanix, Inc. (now Agilent Technologies) and Mentor Graphics. I've served the public as Chair of the Thousand Oaks, CA Planning Commission and now work in New York City. I have been married to my wife Stephanie since 1993, and we have a wonderful daughter Monroe. Learn more about me »

Please note: I reserve the right to delete comments that are offensive or off-topic.