Big Data and 1984?

As the data science and big data technology booms start accelerating, it’s worth noting how these technologies will change our lives – both positively and potentially negatively.

I posted previously about the ongoing discussion of privacy, but I’ve found another post on GigaOM about the same topic.  According to the article, the Supreme Court of the United States heard oral arguments on Tuesday in a case that could decide how connected the concept of big data is to constitutional expectations of privacy.

The case, United States v. Jones, is specifically about whether police needed a search warrant to place a GPS device on a suspect’s car and monitor his movements for 28 days.  Several justices, however, seized upon a very important question: How much data is too much before allowable surveillance crosses the line into an invasion of privacy?  This is a really nice post, and if you’re interested in the constitutional issues regarding privacy (for example, an appellate court has found that warrantless GPS tracking is a violation of the Fourth Amendment), I’d recommend that you take time to read the article

These two posts do highlight interesting differences in privacy and who controls our data.  We sometimes have a knee-jerk reaction to institutions that keep data on us and then use it for other purposes (whether they benefit us or not).  George Orwell’s 1984 and the Big Brother metaphors with which we’re all familiar deal with government controlling the data and what it can do with it – that’s what the US v. Jones case is really all about.

However, in the private world where we interact with companies and people more directly, it’s not really a Big Brother issue, because we give up our privacy all the time – there’s no legal requirement to give up data; we do it by choice.  We willingly give up our privacy in order to benefit from technology – little bit by little bit.  If we want a website to provide us great recommendations (say Netflix), the company is going to have to know more about us – what we like, and what we don’t like.  

It seems a bit “Big Brother”, but even people store data about us all the time – they’re called memories.  Some are good and some are bad; people remember what we enjoy and what we hate.  People who become our friends are the ones that become great matches for us – they enjoy our humor, they know what we like to discuss, and look out for us when we’re not around.

Companies will be trying to do that as well, but of course, it’s all about trust.  Just as we trust our friends with all that they know about us, we hope to trust companies with all the data they store about us.    That’s probably the biggest thing we need to wrestle with in the Age of Big Data – how to establish trust between people and the machines that will be keeping and using the data they have about us…

Kontagent closes $12M, Big Data Helps NextGen Science

A couple of interesting notes today….  On the PRNewswire today, Kontagent, a leading enterprise user analytics company, today announced that it has closed $12 million in Series B financing with a consortium of investors including Battery Ventures, Altos Ventures, and Maverick Capital.  Kontagent focuses on social and mobile web applications with their kSuite product, which combines a proprietary database with customized analytics and real-time monitoring to help customers identify and react to usage patterns in real-time.  This continues the pattern of heavy entry level funding into analytics and big data startups – data science applications are becoming the next big technology boom…

There’s another interesting article snippet at Bloomberg Businessweek about how the oncoming avalanche of data could change the nature of astronomy and physics.  According to the article, Johns Hopkins will be building a 100 gigabit-per-second network to shuttle data from the campus to other large computing centers at national labs and even to Google.  Here’s what Dr. Alex Szalay, Alumni Centennial Professor at Johns Hopkins and head of the network project, thinks about what this could mean for the future of science itself:

In his mind, the new way of using massive processing power to filter through petabytes of data is an entirely new type of computing that will lead to advances in astronomy and physics, much as how the microscope’s creation in the 17th century led to advances in biology and chemistry. In that light, the creation of a 100 gigabit-per-second research network at Johns Hopkins becomes not just a fast network but also an essential tool for research and discovery, a basic component of the 21st century microscope.

You can read about the Kontagent financing deal here, and Dr. Szalay’s effort to build big data networks here

Cool Infographics on Big Data

On WebProNews, Chris Crum shows some interesting infographics about the big data onslaught.   The three topics that are covered include:

  • Big Data – just what the information deluge is going to look like (for example, by 2015, there will be 7.9 zettabytes of data created, enough to fill 1.8 million Libraries of Congress
  • The Digital Promise – whether we’re doing enough to bring America’s classrooms into the 21st century
  • Saving Money – an argument that one might want to consider ending a relationship – a couple could save $743 per month

Interesting graphics (though I wouldn’t advocate breaking up just to save money…) Take a look at the infographics here

Dart-Throwing Chimp Talks Data Science

Interesting how you find things on the Internet, but I ran across this post by Jay Ulfelder, an American political scientist who studies the survival and breakdown of political regimes (among other things) and a self-described dart-throwing chimp (yes, this is how I got here…).

Jay writes an interesting piece about a couple of experiences he had recently that got him thinking statistically – the possible link of the deaths of three children to a self-published book on spanking and corporal punishment, and a debate he was engaged in regarding the effectiveness of U.S. government support for pro-democracy movements in countries under authoritarian rule (pretty heady stuff!…).

In doing so, Jay reflects upon a recent book he’s read – Numbers Rules Your World by Kaiser Fung – about thinking statistically and recounted his experiences in that light.  Jay’s post is lengthy, but if you’re interesting in social and political dynamics (like me) and how data science plays a role in understanding these behaviors (like me!), then you might find Jay’s post worth the read.  You can read his piece here

Analytics as a Lean Startup

I ran across this post on Analyst First by , where he describes the world of Business Analytics as a prime area for the Lean Startup model.

I am a big fan of Eric Ries’ book The Lean Startup, where he advocates treating every new entrepreneurial venture, whether inside an existing company or as its own startup company, as a startup.  And further, since this is a startup venture, the uncertainty about whether this venture will succeed or fail is very high.

So, rather than put together detailed plans about building the product that the business will be based upon, a startup venture should be building the “minimum viable product” and getting feedback from customers quickly to see whether you’re on the right track.  The faster you get feedback, the better you’ll be able to build a business that is sustainable and meets the needs of your customers.

Effectively, Ries argues that you should treat the startup venture, every aspect of it, as a series of scientific experiments designed to inform you whether you are building a sustainable business consistent with your company’s vision.  It’s basically applying the scientific method to your business.

For one, I say, “Absolutely dead on!”  Most business activities, whether marketing or sales or even less-than-disciplined engineering, are performed via rules of thumb (“here’s what’s worked before…”) – there is no true “validated learning”, as Ries put it.   Generally, many businesses and engineering teams operate with the approach of “we made a number of changes last month, and our customers seem to like them, and our overall numbers are higher this month, so we must be on the right track”.  This might make a company feel good, but it gets them no closer to understanding why they might be succeeding, and what to do if things turn south.

And what is worse, the internal workings of the business may be driven by managers more motivated by preserving the current business enterprise than creating a new one.  This puts entrepreneurial ventures at risk from getting started in the first place, or at least started with the greatest possible chance for success.

And I like the way that Samild describes it in his post:

In the twenty-first century we can build almost anything that can be imagined. The challenge is not to build more stuff. It’s to build the right stuff. Most startups fail, says Ries, because they make the wrong things. The key activity of a startup should therefore be learning, not building. What creates value for a startup is it determining whether or not it’s on the path to a sustainable business.

If you’re interested in the Lean Startup approach to business (which, again, I highly recommend), you can find out more at Eric Ries’ website here, and you can buy his book The Lean Startup here…  Also, you can read more of Stephen Samild’s post here

How Moneyball and Human Resources Fit Together

You might not think of a Brad Pitt movie and human resources as fitting together, but stay with me…

Josh Bersin of Bersin and Associates makes the case that human resources professionals need to exploit data science to support their businesses better.  Just as analyzing the statistics of baseball helped Billy Beane’s Oakland A’s beat their better-financed competition, so too can HR departments win the battles to attract and keep top talent for their organization through data science.

According to Bersin, in surveying 711 HR departments, “attracting and selecting the right talent” rates highest among HR skills and capabilities, yet even though better data analysis can more successfully achieve this top goal, these HR departments rate “developing workforce analytics for management” and “measuring HR program effectiveness” at the bottom.  Basically, measurement, analytics, and assessing performance objectively is ranked as the least important skills for HR departments.

Bersin is advocating that HR departments can raise the bar for their effectiveness by embracing data science (I agree!…).  Bersin’s presentation is on SlideShare and can be found here…  Whether you are an HR professional or a data science weenie, you’ll likely find the slides interesting.

WSJ: The King of Big Data

On the Wall Street Journal website today, Ben Rooney posts an interview with Hortonworks CEO Eric Baldeschwieler, co-creator of Hadoop.  For all those in the big data space, the Hadoop project develops open-source software for reliable, scalable, distributed computing, and Hortonworks is focused on accelerating the development and adoption of Hadoop.

In Rooney’s interview, Baldeschwieler describes the problem Hadoop is designed to solve:

At its base, it is just a way to take bulk data and storage in a way that is cheap and replicated and can pull up data very, very fast.

Hadoop is at one level much simpler than other databases. It has two primary components; a storage layer that lets you combine the local disks of a whole bunch of commodity computers, cheap computers. It lets you combine that into a shared file system, a way to store data without worrying which computer it is on. What that means is you can use cheap computers. That lets you strip a lot of cost out of the hardware layer.

The thing that people don’t appreciate when you drop a lower price point is that it is not about saving money, it is about being able to do an order of magnitude more on the same budget. That is revolutionary. You can score five to 10 times more data and you can process it in ways that you can’t imagine. A lot of the innovation it opens up is just the speed of innovation. You get to an answer faster, you move into production faster, you make revenue faster.

You can read Rooney’s WSJ interview here

IW: Big Data Doesn’t Always Mean Better Insight

Shvetank Shah is executive director of the Corporate Executive Board, and recently wrote an article for InformationWeek about the need for scrutiny when launching into big data initiatives.  Here’s the first paragraph from Shah’s post, which sums things up pretty well:

Even as companies invest eight- and nine-figure sums to analyze the information streaming in from suppliers and customers, fewer than 40% of employees have the right processes and skills to make good use of the analysis. Think of this as a company’s insight deficit. To overcome it, “big data” needs to be complemented by “big judgment.”

Just because we now have lots of data to work with doesn’t mean that we will now get better decisions.  How we turn data into actional information – the methods, the tools, the techniques – are incredibly important.  Also, as Shah points out, sometimes the data itself is just plain bad, and people don’t always trust it.  Gaining insight from data is the important factor, not merely getting more data.

You can read Shah’s post on InformationWeek here, and more from CEB on this topic is available at insightdeficit.com.

Strata Workshop: Street Fighting Data Science

At the upcoming Strata conference in late February 2012, they’ll be hosting workshops on applying better data science practices in the real world. 

One of the workshops will be held by Pete Skomoroch, who is a Principal Data Scientist at LinkedIn and focuses on reputation systems, personalization, and creating data driven products like LinkedIn Skills.  Here’s a snippet from the workshop description that makes these types of workshops “dead on”:

New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve problems with big data.

You can find out more about the workshop here, and more about the Strata conference and other conferences here

Popular Science: Wolfram on Big Data

I’m a big fan on Stephen Wolfram and his efforts in building Mathematica and pushing forward his approach to scientific discovery, A New Kind of Science.  In a recent post, Popular Science editor Mark Jannot talks to Wolfram about big data, human understanding, and the origin of the universe.

Here’s just on back-and-forth between Jannot and Wolfram:

Jannot:  A couple years ago at TED, Tim Berners-Lee led the audience in a chant of “More raw data now!” so he’s out there trying to create the data Web. And your project in Wolfram Alpha is not to deliver raw data but to deliver the interface that allows for raw data to turn into meaning.

Wolfram:  Yes, what we’re trying to do—as far as I’m concerned, the thing that’s great about all this data is that it’s possible to answer lots of questions about the world and to make predictions about things that might happen in the world, and so on. But possible how? Possible for an expert to go in and say, “Well, gosh, we have this data, we know what the weather was on such-and-such a date, we know what the economic conditions at such-and-such place were, so now we can go and figure something out from that.” But the thing that I’m interested in is, Can one just walk up to a computer and basically be able to say, “OK, answer me this question.”

This part of the Q&A is particularly interesting, since it highlights a difference of approach in what some want in technology.  Berners-Lee seems to want more “raw data”, while Wolfram is highlighting that the data isn’t really important unless you can turn the data into actionable information.  Wolfram|Alpha does just this – the technology uses Wolfram’s understanding of computation (what he built as part of his wildly successful Mathematica product line) and lets us answer questions. 

It’s an incredibly rich article – one worth reading (1) if you’re interested in data and where its taking us, and (2) if you’re interested in Wolfram and his take on science and technology.  I’m interested in both, so I think it’s very worth highlighting…

Here’s the Popular Science article, and another post to Wolfram|Alpha that highlights the history of computable knowledge (you can even order a poster of the timeline here…).   I’ve had a number of other posts on Wolfram and his scientific approach, which might worth looking into as well…