Harnessing Big Data Identified as Key Health Care Innovation for 2012

The healthcare industry is hot on the big data bandwagon, based on a recent list of key medical innovations presented by the Cleveland Clinic.  #8 on the list was “Harnessing Big Data to Improve Healthcare” – this top 10 list included other obviously medical innovations such as catheter-based renal denervation to control resistant hypertension and implantable devices to treat complex brain aneurysms.

Of course, even the healthcare industry is recognizing the need to pull more information out of the ocean of data they have.  In explaining why this ranked so high on their list, the Cleveland Clinic said:

The amount of data collected each day dwarfs human comprehension and even brings most computing programs to a quick standstill. It is estimated that 2.5 quintillion bytes of data are created daily, so much that 90% of the data in the world has been created in the last two years. This is what’s called big data, and hospitals, medical centers, hospital systems, pharmaceutical, biotechnology and medical device companies that comprise the trillion-dollar healthcare industry in this country are awash in it. Together, they easily amass terabytes and oftentimes petabytes of structured and unstructured data. Unfortunately, not enough of this deluge of big data sets has been systematically collected and stored, and therefore this valuable information has not been aggregated, analyzed, or made available in a format to be readily accessed to improve healthcare.

You can see read more about what the healthcare industry hopes to obtain by leveraging big data here, and the complete Cleveland Clinic’s Top 10 Medical Innovations list here

Basho Raises $5M

The Boston Business Journal is reporting that big data storage firm Basho Technologies raised $5 million in its latest round of financing.  According to the report, this brings Basho’s total to $12.5M raised in 2011, adding to the $7.5M total they raised back in May.

Basho makes highly distributable and scalable non-relational databases, which are needed for handling and managing the incredibly large datasets now available.  These types of technologies are some of the hottest offerings in the market right now, indicated by the ability of these firms to raise significant capital.  Basho, founded in 2008 by a group of software architects, engineers, and executives from Akamai, recently announced the licensing of their NoSQL database technology, Riak, to the National Board of E-Health in Denmark to operate their nationwide medical prescription card program.

NYT: The Future of Computing

Here’s a nice post from the New York Times about big data, speed, and the future of computing.  It talks a little bit about the technology that makes IBM’s Watson computer so fantastic at beating Jeopardy! champions (we first wrote about this last year…), and that the need for speed will likely change the computer architectures themselves.

There will likely be groundbreaking changes in hardware and software, where computation and decision-making will both become part of the same technology.  This could be where much of the analytics engineering advances come from over the next decade.  Read more from the NYT post here

XGraph Acquired by Clearspring

AllThingsD is reporting that New York based data science company XGraph is being acquired by Clearspring of McLean, VA, a social-sharing company.  XGraph, which focuses on modeling and monetizing the Web’s social graph, started about 3 years ago and raised about $3.75 million last year; they have 15 employees and would boost Clearspring’s headcount to 85.

In May of this year, Clearspring raised $20 million, saying at the time that it would spend the cash on acquisitions that would leverage data; XGraph seems to be just one of those targeted acquisitions.

Here’s the official press release from Clearspring announcing the acquisition (via AllThingsD):

Clearspring Acquires XGraph to Create Largest Multi-Graph on the Open Web

Company accelerates growth by deepening data team and technology

McLean, VA and New York. NY. — November 1, 2011 –Clearspring, provider of the largest social sharing and analytics platform, AddThis, announced today it has acquired XGraph, Inc., a leading data science company focused on modeling and monetizing the web-wide social graph. Clearspring’s massive reach and proprietary real-time data processing capability, coupled with XGraph’s audience technology, create the largest multi-graph platform on the web — mapping 1.2 billion user’s connections by brand affiliation, intent and social behavior.

The investment in XGraph’s data science capabilities marks another step on Clearspring’s rapid growth trajectory. XGraph’s team has deep data science expertise with applied backgrounds in advertising, sociology, mathematics and computer science. Their unique technology dynamically organizes users by shared connections and interests. XGraph’s team and platform will drive Clearspring’s existing efforts with publishers, advertisers and agencies forward while also setting the stage for new innovation.

“Clearspring is at the epicenter of two major shifts online — the web becoming social and personal, and advertising becoming data-driven and accountable. The common thread in both changes is data. To compete in this new world, companies will not only need the ability to access and process big data, but also have the ability to activate that data to create value for consumers, publishers and advertisers,” said Ramsey McGrory, Clearspring’s new Chief Executive. “The combined company has the people, technology and data to enable our clients to stay at the forefront of these changes. 2012 will be a breakout year for Clearspring.”

For advertisers, agencies and trading desks, Clearspring will immediately be able to provide the largest multi-graph audience targeting capabilities available on the open web. By using this technology to identify a brand’s core audiences and finding millions of other connected and like-minded people online, the company can now drive more efficient spending and increased campaign performance. Clearspring also plans to leverage this new capability to deliver publishers unique audience insights, monetization capabilities and actionable data products in the coming year.

“Most companies only capture one dimension of how we’re all connected, whether it be our friends or people we share with — a single graph approach. XGraph not only models these social connections, but also multiple other types of connections such as brand affiliations, intent and more — a multi-graph approach,” said Key Compton, XGraph’s CEO. “We’re truly excited to leverage our technology to unlock the value of Clearspring’s massive data set and help publishers and advertisers truly harness the power of the web-wide interest graph.”

XGraph is headquartered in New York with an office in Silicon Valley. All XGraph employees based in New York will join Clearspring’s office there. Clearspring plans to keep the office in Silicon Valley. The combined company will have 85 employees nationwide.

Popular Science: The Glory of Big Data

You know “big data” has gone mainstream when it shows up in Popular Science.  Today’s PopSci post describes the realization that the ocean of data is really here, and we’re all going to have to figure out how to swim in it.

An example of just how huge the data is comes from the callout quote from the article:

In 2011 the volume of available data is predicted to continue along its exponential growth curve to 1.8 zettabytes. (A zettabyte is a trillion gigabytes; that’s a 1 with 21 zeros trailing behind it.)

In the world of science, we have prefixes for how big things are:  a “megabyte” is 1,000,000 bytes*, a “gigabyte” is 1,000,000,000 bytes (or 1,000 megabytes), and so on.   The term “zettabyte” is so large that there is only one other officially recognized prefix beyond that (the “yottabyte” or 1,000 zettabytes) – at some point, we’re going to need to approve more prefixes!

* – and yes, I know that for computer memory, a megabyte is 1,048,576 bytes, and for computer storage, it’s 1,000,000 bytes.

According to the PopSci article, the amount of data is following Moore’s Law pretty well, where the data sizes, memory capacity, processor speeds, etc. (basically all things computation) all seem to double every 2 years or so.  Here’s a quote from the article:

The amount of data available to us is increasingly vast. In 2010 we played, swam, wallowed, and drowned in 1.2 zettabytes of the stuff, and in 2011 the volume is predicted to continue along its exponential growth curve to 1.8 zettabytes. (A zettabyte is a trillion gigabytes; that’s a 1 with 21 zeros trailing behind it.) The IDC Digital Universe study from which I’ve plucked these numbers helpfully notes that if you were inclined to store all that data on the hard drives of 32-gigabyte iPads, doing so would require 57.5 billion devices—enough to erect a 61-foot-high wall 4,005 miles long, from Miami all the way to Anchorage.

 

Data Science Push by Columbia University

The Columbia Spectator describes plans in a recent post about expansion of their engineering and data science schools.  Columbia’s School of Engineering and Applied Sciences (SEAS) is making a big push to become a leader in the new data science field, planning for a huge expansion which include 1,000,000 square feet of additional space that would eventually stretch to a third building north of 131st Street, according to the Dean of SEAS, Feniosky Peña-Mora.

SEAS has been making moves to become one of the nation’s top engineering schools, making connections with New York City’s political leadership and also creating entrepreneurial residency programs to incubate startups.

Big Data Focused on Speed

A post today on ITBusiness.ca talks about the need for handling big data, and describes some of the efforts by the big name “big data” players, such as SAS, EMC Greenplum, and Accenture.  According to the post, 72 percent of companies polled by the Accenture SAS Analytics Group say they will be spending more on business analytics in 2012.

There’s a big push to make analytics real-time, going from hours of runtime to merely minutes, and the big data technologies that are being created support this push.  The next step will be the intelligent analysis of all this data, not merely crude analysis done really fast.  As the questions we try to answer become harder, we’ll need a real engineering discipline in developing the right analytics, not merely making the current analytics faster…

Data Science Meets Insurance

It seems that data science is permeating every industry these days, and new industry reports by Forrester Research describe how data science will affect the insurance industry.

Ellen Carney, senior analyst at Forrester, indicates that insurance companies need to build a solid technology foundation to compete in the rapidly moving industry.  Here’s a bit from Insurance Networking News on the report:

So what are architects building toward? Carney foresees big data playing a big part in the future of insurance. She tells Insurance Networking News that carriers know what they’re supposed to be doing with big data but few currently have the data science resources/roles to do anything with it. “I know a lot of the big carriers are trying to build out the big data science skills and I even have talked with a smaller regional carrier in the SE that’s building a data science team,” she says.

Nerd Pride Friday: Space Monolith Action Figure

In an effort to spread geekiness to the world and make it catch on, I am starting a segment called Nerd Pride Friday.

I am an unapologetic nerd – I dig science and math and science fiction movies and getting to the nitty gritty on how stuff works.  So, I am on a mission (well, not so serious a mission, but at least a fun one!) to make sure people see what is cool, as least from a geek’s point of view.

This Friday, I start with the Space Monolith Action Figure.  For those of you who’ve seen 2001: A Space Odyssey, you’ll recognize this as “The Monolith” (according to the movie, it’s what got us humans over the hump from being apes!…).

ThinkGeek.com has the action figure for sale on its website (apparently, “Dave” is not included…)  Buy it and you get 300 geek points!  I’ve already asked for this (and Dave) for a stocking stuffer…  A couple years ago, I posted some other great stuff from Think Geek…

In another post, I noted that Newsweek had listed the Robots Hall of Fame, where HAL 9000 from 2001 was called out.  Three cheers for HAL, Dave, and the Monolith!

Forbes to Tunkelang: What is a Data Scientist?

Dan Woods of Forbes is continuing to interview people in the field of data science to gain their understanding of this new discipline.

This post has his interview with Daniel Tunkelang, a principal data scientist at LinkedIn.  Prior to joining LinkedIn, Tunkelang led a local search quality team at Google, and was co-founder and Chief Scientist at Endeca, which was recently purchased by Oracle for $1.075 billion.

Here’s Daniel’s response to Woods on what he thinks a data scientist is:

I’m a big fan of Hilary Mason, chief scientist at bit.ly, so I’ll cite her definition: a data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product. At LinkedIn, products pioneered by data scientists, such as People You May Know, harness the power of data to create value for users.

© 2011 Mic Farris. All rights reserved.

Bad Behavior has blocked 874 access attempts in the last 7 days.