Archives For February 2012

60 Minutes aired a piece last night about scientific fraud at Duke University, where data was fabricated in order to support alleged discoveries in individualized cancer therapies.  As a result of these investigations, a number of previously published scientific articles have been retracted.

Less than a week ago, I highlighted an infographic from Jen Rhee about the alarming statistics in science fraud.  I’m really disheartened that such a highly visible example came up so quickly… 

In the 60 Minutes piece, it seems clear that the fraud came from one scientist, Dr. Anil Potti, but there were some checks and balances that weren’t in place that created the circumstances.  When the research was published, many labs tried to reproduce the results, and two researchers at the University of Texas, Kevin Coombes and Keith Baggerly, began analyzing Dr. Potti’s data to verify his results.  What they found could only be explained through deliberate manipulation of the data, starting off a chain of events that led to retractions from Duke researchers, suspension of grants, and the eventual suspension of Dr. Potti from the Duke staff.

It took a dedicated newsletter, The Cancer Letter, to discover that Dr. Potti even falsified his own credentials, stating that he received a Rhodes Scholarship when he in fact did not, and trigger a thorough examination.  Unfortunately, Duke did not have enough institutional checks in place to catch this on their own. 

It was nice to see that the primary researcher in charge of the lab, Dr. Joseph Nevins, came out and took responsibility to the episode.  When Dr. Nevins was asked, after reviewing the original data to see if it had been fabricated, he said it was “abundantly clear” that it had.

Look – people make mistakes, even scientists when they are trying to analyze data and draw conclusions.  The scientific process is all about trying to find the truth, and being willing to accept the truth, even if it’s different than you’d like the truth to be.

But, as I said in my previous post on this type of fraud,

Real scientists… care about what the data is actually saying and discovering the truth.  When someone cares about something else other than the truth (money, celebrity, fame, etc.), then bad science is what you get.  Of course, when there are people involved, sometimes the truth isn’t the top priority.

The real tragedy is that people were affected and possibly harmed as a result of this fraud.   The fabricated data was used to validate a theory, which led to medical therapies that went through clinical trials, meaning that real people could have been given medicine that very well may have done harm to them.  As Dr. Coombes said during the 60 Minutes piece:

… you would be giving patients drugs that would definitely not benefit them.  So there’s clear potential for harm there.

Bad science should be rooted out, and good science needs to be advocated everywhere.  The truth is important and worth finding…

 

Stephen Wolfram is doing it again.  I’m a big fan of Wolfram (you can read some of my other posts here, here, and here…), and am always intrigued by what he comes up with.  A couple of days ago, Wolfram launched his latest contribution to data science and computational understanding – Wolfram|Alpha Pro

Here’s an overview of what the new Pro version of Wolfram|Alpha can provide:

With Wolfram|Alpha Pro, you can compute with your own data. Just input numeric or tabular data right in your browser, and Pro will automatically analyze it—effortlessly handling not just pure numbers, but also dates, places, strings, and more.

Upload 60+ types of data, sound, text, and other files to Wolfram|Alpha Pro for automatic analysis and computation. CSV, XLS, TXT, WAV, 3DS, HDF, GXL, XML…

Zoom in to see the details of any output—rendering it at a larger size and higher resolution.

Perform longer computations as a Wolfram|Alpha Pro subscriber by requesting extra time on the Wolfram|Alpha compute servers when you need it.

Licenses of prototying and analysis software go for several thousand dollars (Matlab, IDL, even Mathematica) - student versions can be had for a few hundred dollars, but you can’t leverage data science for business purposes on student licenses.

Wolfram|Alpha Pro lets anyone with a computer, an internet connection, and a small budget to leverage the power of data science.  Right now, you can get a free trial subscription, and from there, the costs are $4.99/month.  This price is introductory, but it could be sedutive enough to attract a lot of users (I’ve already signed up – all you need for the free trial is an e-mail address…)

One option that I find really interesting is Wolfram’s creation of the Computable Document Format (CDF), which interactivity lets you get dynamic versions of existing Wolfram|Alpha output as well as access to new content using interactive controls, 3D rotation, and animation.  It’s like having Wolfram|Alpha is embedded in the document.

I had attended a Wolfram Science Conference back in 2006 and saw the potential for such a document format back then.  There were a number of presenters who later wrote up their work into a paper, published by the journal Complex Systems.  Since many of the presentations utilized a real interactivity with the data, I could see where much of the insight would be lost when people tried to write things down and limit their visualizations to simple, static graphs and figures.

I remember contacting Jean Buck at Wolfram Research, and recommending such a format.  Who knows whether that had any impact, but I’m certainly glad to see that this is finally becoming a reality.  I actually got the opportunity to meet Wolfram at the conference (he even signed a copy of his Cellular Automata and Complexity for me… – Jean was kind enough to arrange that for me – thanks, Jean!)

If you’re interested in data science and have a spare $5 this month, try out Wolfram|Alpha Pro!

Bad Science

2012/02/07 — 1 Comment

Jen Rhee has done some great homework on bad science and put them into a cool infographic that’s worth looking at.  Here are some of the highlights from her research into bad science:

  • 1 in 3 scientists admit to using questionable research practices
  • 1 in 50 admits falsifying or fabricating data outright
  • Among biomedical researcher trainees at UC-San Diego, 81% said they would modify or fabricate results to win a grant or publish a paper

This is obvious disturbing, and worth highlighting to try and root these things out.  Science is about finding the truth – no matter what it is – and as more businesses start using data science in order to drive business outcomes, we need to make sure that science is about being honest – with the truth and with ourselves.

The scientific method was developed to provide the best way to figure out what the truth is, given the data we’ve got.  It doesn’t make perfect decisions (no method can), but it’s the best method available.

Real scientists (the ones not highlighted in Jen’s research) care about what the data is actually saying and discovering the truth.  When someone cares about something else other than the truth (money, celebrity, fame, etc.), then bad science is what you get.  Of course, when there are people involved, sometimes the truth isn’t the top priority.

Great infographic, Jen!  You can find it here