Data Science has become an exploding field in recent years, and depending on whether you are focusing on machine learning, artificial intelligence, or citizen data science, the discipline of data science is creating very high expectations.
There is indeed much promise for data science, where predictive models and decision engines can target skin cancer in patient imagery, presciently recommend a new product that piques your interest, or power your self-driving car to evade a potential accident.
However, promise requires much effort for it to be realized. It takes a lot of work and brand new engineering disciplines that are not yet mature or even employed on a wide scale. As there is greater recognition of the value of data science, and the generation of data is increasing at exponential rates, this engineering effort is starting and will grow beyond its adolescence soon.
This is why we are at the advent of a new engineering discipline that can truly realize the promise of data science – a discipline that I call “analytics engineering”.
I’ve been performing data science before there was a field called “data science
“, so I’ve had the opportunity to work with and hire a lot of great people. But if you’re trying to hire a data scientist, how do you know what to look for, and what should you consider in the interview process?
I’ve been doing what is now called “data science” since the early 1990s and have helped to hire numerous scientists and engineers over the years. The teams I’ve had the opportunity to work with are some of the best in the world, tackling some of the most challenging problems facing our country. These folks are also some of the smartest people I’ve ever had the opportunity to work with.
That said, not everyone is a good fit, and the discipline of data science requires important key elements. Hiring someone into your team is incredibly important to your business, especially if you’re a small startup or building a critical internal data science team; mistakes can be expensive in both time and money. This can be even more intimidating if you don’t have the background or experience in hiring scientists, especially someone responsible for this new discipline of working with data.
I read a couple of items in this month’s Fortune magazine that I thought it was worth passing along.
The first was a small article by Brian Dumaine about the work being done at Applied Proteomics to identify cancer before it develops. At Applied Proteomics, they use mass spectroscopy to capture and catalog 360,000 different pieces of protein found in blood plasma, and then let supercomputers crunch on the data to identify anomalies associated with cancer. The company has raised $57 million in venture capital and is backed by Microsoft co-founder Paul Allen. You can read the first bit of the article here.
The second is from the Word Check callout, showing how access to information is making the word a better place:
wasa: Pronounced [wah-SUH]
(noun) Arabic slang: A display of partiality toward a favored person or group without regard for their qualifications. A system that drives much of life in the Middle East — from getting into a good school to landing a good job.
But on the Internet, there is no wasa.
– Adapted from Startup Rising: The Entrepreneurial Revolution Remaking the Middle East by Christopher M. Schroeder
Imagine a guy with glasses who used to model baseball stats and play online poker nailing the outcome of the 2012 elections. And when I say “nailing”, I mean that he correctly predicted the U.S. Presidential contest in every one of the 50 states (and nearly every U.S. Senate race, too). He even performed better than some of the most widely-used polling firms. Now imagine that he gives his thoughts on making these types of predictions. That’s exactly what Nate Silver does in his new book The Signal and the Noise" target="_blank">The Signal and the Noise.
I’ve worked in what’s now being called “data science
” for nearly twenty years. The title of Silver’s book – The Signal and the Noise – presents an important and sometimes overlooked part of this science. The “signal” is what we’re looking for in the data, and the “noise” is all the stuff in the data that gets in the way of what we’re looking for.