Ever wonder what your own personal network looks like?  You are likely connected to many different groups (family, friends, community, work), but do you know how they are connected?  Or are they connected at all?  Are you the glue that connects these various groups?

Word Cloud

This is a great age we’re living in, and I’m glad to be involved with developing lots of really advanced technologies.  One of the technology areas that I’m really fascinated with has been pushed forward by Stephen Wolfram.  He created the industry standard computing environment Mathematica, which now serves as the engine behind his company’s newest creation, Wolfram|Alpha.  (I’ve written a few posts on Wolfram|Alpha in the past, and you can read them here and here).

One feature that they’ve recently added to Wolfram|Alpha is the ability to analyze your Facebook data.  Usually, if you use Facebook, you only focus on the posts your friends make – pictures from their great vacations, LOLcats, or sharing articles for other websites (like this one!)  However, here are three reasons why it might be worth it for you to unlock these insights from Facebook:

  • It gives you insight into your connections and their connections.   For example, I happen to have a number of groups that I’m connected with.  Some are work-related (Areté and Mentor Graphics), some are community-related (Thousand Oaks), some are from where I grew up (Brillion and Virden), and others are politically-related (Ross Perot).  With this view of what’s called your social graph, you can see a view of who you are, based on looking at who you’re connected with.
  • Social Network

  • You learn about yourself.  Getting your Facebook report through Wolfram|Alpha is kind of like looking in a different type of mirror.  You get to see yourself through your own data; it can help you improve in areas where you want to see improvement – I even wrote a post about why it can be good to collect data on yourself.
  • It’s fun.  Viewing yourself in different ways can be interesting and fun!  Sometimes it takes these different views to really understand who you are and how you got here.

If you’re interested in unlocking your Facebook data using Wolfram|Alpha, here are some simple steps:

  • Go to www.wolframalpha.com.  It looks very much like the Google search page with a single bar for entering text
  • Type in “facebook report” or you can click on the stylized Facebook icon.
  • Wolfram|Alpha will then ask you to click “Analyze My Facebook Data”
Once you’ve done this, Wolfram|Alpha will generate a long report, giving you many views on your data and yourself. If you’re interested, there is a post from the Wolfram|Alpha blog that explains these new features and another good article to read from NBCNews.com.
Mic Farris Facebook Report

New technology is allowing us to see more views of ourselves for self-improvement and for entertainment.  Take some time and use Wolfram|Alpha to learn a little more about yourself.

Question:  Have you ever used Wolfram|Alpha?  Are there any other tools you find interesting in looking at your own social network?  You can leave a comment below.

This is a technical post about what I’ve discovered in creating my own custom URL shortener.  Hopefully, you can learn to do the same things I did, and my experience will save you some headaches if it’s something you’re interesting in trying.

Short-and-Tall

On my website, I focus a lot about decisions and discovery.  I love finding out how the world works and then applying what I’ve learned to make better decisions, and I also try to share what I can along the way.  I hope that it helps others.

When it came to URL shortening, I was interested by a post from Michael Hyatt, who has a tutorial on how he created his own custom URL shortener.  I was intrigued, since he has a great how-to guide on his website.  However, I didn’t want to pay for an additional service, so I tried to find out how to create one myself and do it inexpensively.  Luckily, I discovered how to do it, and I’m happy with the results, so I thought I’d write this post to share my experience.

A couple things first, in case you’re new to the whole concept.

What is a URL?  URL stands for Universal Resource Locator, but you can think of it as the name of the website page you are trying to reach on the Internet.  http://www.nytimes.com, http://www.micfarris.com/about/, and http://espn.go.com/ are all examples of URLs that will take you places on the Web.

Next question, what is a URL shortener and why would I want to use one?  It’s a service that allows you to create a link to one of these pages, maybe one to somewhere on your website or anywhere else on the Internet, using a much shorter link name.  Bit.ly is such a service, and it’s the one that I use.  Here’s an example of why I like to shorten my URLs.

I have a post on my website about Stephen Hawking and his amazing ability to communicate, even though he suffers from ALS.  The link itself, as you can see below, is very long:

This link itself is 109 characters, so it takes up a lot of room.  Also, if I wanted to share this link on Twitter, I’d use up 109 of the 140 characters that Twitter allows.  If the link were longer, I might not be able to pass along the link at all!  It would be nice to shorten this link and let people reach this article while still providing a message to interest them in following the link.

I can do this by going to bit.ly and entering the URL itself.  Bit.ly will then create a shorter URL that will take me to exactly the same place:

and with my custom URL address, this link becomes:

There are a number of reasons why creating a custom short URL can be a benefit for you:

  • Using a URL shortener allows you to share more with others on Twitter and Facebook. When the links you share don’t take up so much space, you can focus on your message to your readers. And it becomes especially important when using Twitter, since you are only limited to 140 characters for your tweet.
  • You can keep track of the number of clicks your shared posts get, allowing you to better understand your readership. Bit.ly keeps track of the number of times people click your shortened links, so you can get a sense of which links are more popular and when people chose to click on them.
  • Making a custom URL shortener allows for more consistent branding. It’s great to use a URL shortener for sharing links with, say, your Twitter followers. However, if you’re able to do this while continuing to promote your website or business, then sharing this information with your followers becomes even more effective.

So, now that you know the benefits, are you ready to learn how to create one for yourself?  Great!

Here are the steps that I took in getting and setting up my custom short URL:

  • Buy the URL you’d like to use for URL shortening.  You will have one for your blog or website, but you’ll want to use a different URL for URL shortening.  Select one that brands yourself, your website, or your business well, but keep it short (otherwise it defeats the purpose!).
  • Twitter allows for 140 characters, so keep your shortening URL to 12 characters or less.  I use micfarris.us for my URL shortening, but the New York Times uses nyti.ms.  This way, you can tweet a shortened URL link (such as micfarris.us/WGtCik) and still have enough room to tweet a helpful message.
  • Check out available web addresses. You can go to http://domai.nr/ to check out available URLs.  I eventually chose micfarris.us for two reasons:
    • It contains “micfarris” to further the branding for myself and my website
    • The .us domains are a lot cheaper to purchase.  GoDaddy sells .us domains for $3.99 for the first year, so it’s an inexpensive way to get started.  I looked at getting micfarr.is (which is a domain from Iceland), but it cost $99/year, so I decided against it (for now!)

  • Connect with a URL shortening service to let them know your new domain name.  As I mentioned before, I use bit.ly – it’s free, and they do a great job with their URL shortening service.  Here are the steps for performing this step using bit.ly:
    • Sign in to your bit.ly account (or if you don’t have one, just create one)
    • Go to “Settings” from the upper right pull-down menu (or click here), click the “Advanced” tab, and then click the “Add a custom short domain” link.
    • Enter your new domain to assign it to your account

  • Connect your new domain to the website for your URL shortening service.  Here are the steps for performing this step using GoDaddy.com (the place that keeps my web address) and bit.ly (that performs the URL shortening):
    • Log into GoDaddy.com, click on “My Account”, then click on “Domains”
    • Click on the new domain you want to connect with bit.ly
    • Under the “DNS Manager” heading, click on the Launch link
    • The “A (Host)” record should be the first on the page, so we’ll want to change this to point to the bit.ly website.  You’ll want to change the IP address (the four number sequence in this record) to 69.58.188.49You should double check with bit.ly to make sure this is the right IP address (don’t just take my word for it!)  To double check the IP address and for more specific information from bit.ly, you can go here.
  • Wait patiently.  It can take up to 48 hours for the new information to propagate the right settings through the servers.  When I set up my custom short URL, it took less than an hour or so, but sometime it can take longer.
Setting up your own custom short URL for branding yourself, your website, and your business is easy.  It was surprisingly painless for me to set this up, and I’m sure that you can follow these simple steps to create your own, just like I did.

Question:  Have you ever thought about creating your own custom URL for Twitter?  Are there other short URL services that you like?  You can leave a comment below.

It’s a complex world, and we are constantly making decisions.  Just imagine the number of decisions we make about breakfast:  How big a breakfast should I have?  Should I have coffee?  If so, how much?  Should I have toast?  Should I use butter?  Should I have one piece or two?  Should I cut the toast?  If so, should they be cut into rectangles or triangles?  Should I keep the crust? Should I have juice?  Should it be apple juice or orange juice?  How about milk?  I haven’t even gotten to the pancakes, waffles, syrup, sausage, cereal, bacon… (mmm, bacon…)

question-mark

And these aren’t the really important ones!  How do we know we’re making good decisions, and can we make better ones?

In my professional life, I’ve spent decades understanding and applying the theory of making decisions.  Our teams have worked to teach computers to make decisions automatically from tons and tons of data.  In fact, these disciplines are now incredibly important for new technology development.

But understanding how decisions are made doesn’t only apply to technology.  There are definitely things we can learn from this understanding to help us make better decisions ourselves.

Here are three things that are important to recognize about making decisions:

  • We don’t know everything.  We may not have all the information we might like in order to make our decisions.  For example, if you’re playing a card game like poker or bridge, you don’t know what cards the other players have.  This lack of knowledge is called uncertainty.  Recognizing uncertainty is the first key to making better decisions, since uncertainty is all around us.
  • We can’t know everything.  The sheer number of possibilities for what we see in life makes it impossible to know things with certainty.  (In fact, if you can believe it, quantum physics tells us we aren’t able to know everything, at least through our observations, but that’s another story…).  There are things that we can get to the bottom of, but don’t sweat trying to get to the bottom of everything; you actually can’t.
  • There are likely many possible explanations to what we see.  Since we don’t (and can’t) know everything, there might be multiple reasons why the information we have came to us.  This doesn’t mean that we should get overwhelmed and be afraid of making a wrong decision.  Our job is to figure out the most likely explanation and then make our decision with the knowledge.

Making better decisions means first recognizing that life is filled with uncertainty and we’re never getting rid of it.  However, we can take steps to reduce this uncertainty and learn how to make better decisions as a result.

P.S. If you’re interested in a good book on uncertainty and how to make better predictions in light of this uncertainty, I have a review of Nate Silver’s book here.

Question:  Have you ever been uncomfortable making decisions because you felt you didn’t know enough?  You can leave a comment below.

You might think that it’s a bit odd, treating yourself like a science experiement.  However, the best way to achieve your goals may be to do just that – be committed to collecting data on yourself.

Chalk Chart

In science, we’re always collecting data and analyzing it to find out more about the world.  However, collecting data isn’t only for people with pocket protectors (although we don’t all wear those!).  It is something that any of us can use to help us achieve any goal we set for ourselves.

Several years ago, I used to weigh a lot more than I do now.  At some point, I just decided that I wanted to get to a healthier weight.  I was concerned about my long term health staying at this higher weight, and I knew if I didn’t take this seriously, I wouldn’t be able to enjoy much of life later on.

I decided to collect data on myself so that I could see how I was doing over time.  I weighed myself every morning and recorded it in an iPhone app.  I even kept track of how many calories I ate each day. This forced me to see what every handful of snacks and bowl of ice cream was costing me toward my goal of a lower target weight.  Eventually I lost 40 pounds from my peak weight, and I’ve kept (most of) it off ever since.

Here are five reasons why you should consider collecting data on yourself to achieve your goals:

  • Looking at your data shows how you’re trending.  If you have a goal in mind, such as losing twenty pounds, you need to know how you’re doing.  This can only happen if you are committed to collecting data every day, and watching how the data changes.  If you’re getting closer to your goal, you’ll see your weight drop over time.
  • Not taking data can trick you into thinking we’re on track.  It’s far easier to convince yourself you are on track if there is nothing to counter you.  However, in science, data is king.  If you’re serious about achieving your goal, then you’ll be happy to collect data on yourself to know you’ll get there.
  • It works for anything.   Keeping track of your weight is an easy example, but it truly helps with any goal you set for yourself.  Collecting data on yourself is good for your personal development and growth.  It can also work for your business (keeping track of new customer contacts and new sales) and even for your community (funds raised for local charities or scholarships for worthy students).  It even works for gaining a general understanding of how the world works, which is the ultimate goal of science.
  • It keeps you honest.  You can’t fool the data.  If your goal is to lose twenty pounds and you haven’t lost a single pound for an entire week, you know that you haven’t made progress.  The data will tell you that something needs to change, and you can make that change to keep you moving forward.  Keeping on track requires you to be honest with yourself, and collecting data on yourself helps you do just that.
  • You learn more about yourself. As you collect your own data and take a look at how you’re doing, you’ll learn new things about yourself.  Am I focused enough on my goals?  Is it getting any easier?  What can I do to acheive my goals faster?  Can I even set a new goal, surpassing what I first thought I could achieve?
We can always do more to help ourselves keep us on track.  While the first thing we need is the goal itself, we also need to collect the information that keeps us honest about achieving that goal.  Be committed to collecting data on yourself and your achievements will start piling up before you know it.

Question:  Have you ever tried collecting data on yourself?  If so, what did you learn?  If not, do you know where to start?  You can leave a comment below.

Imagine a guy with glasses who used to model baseball stats and play online poker nailing the outcome of the 2012 elections. And when I say “nailing”, I mean that he correctly predicted the U.S. Presidential contest in every one of the 50 states (and nearly every U.S. Senate race, too). He even performed better than some of the most widely-used polling firms. Now imagine that he gives his thoughts on making these types of predictions. That’s exactly what Nate Silver does in his new book The Signal and the Noise.

Nate-Silver-book
I’ve worked in what’s now being called “data science” for nearly twenty years. The title of Silver’s book – The Signal and the Noise – presents an important and sometimes overlooked part of this science. The “signal” is what we’re looking for in the data, and the “noise” is all the stuff in the data that gets in the way of what we’re looking for.

With companies like Facebook, Twitter, LinkedIn and Netflix delivering new products based on data, more attention is now being focused on what we can learn from all this new information. Political polling has been around for a while, but Silver managed to take these scientific principles and apply them in a new way, leading to results that are astonishing to most people. Silver ended up beating some of the oldest and most storied polling firms, such as Gallup, highlighting real biases in their polling (for example, Gallup performed poorly for the third straight national election, and Silver noted that Gallup polls were biased toward Republicans by as many as 7 percentage points).

In his book, Silver focuses on how these techniques can be applied in nearly every area of forecasting from baseball to poker to weather forecasting to earthquake predictions. He does focus on some technical things (such as Bayesian reasoning), but does a good job of not letting that get in the way of his story. More broadly, here are four points that I thought come out of Silver’s book:

  1. You can make better decisions if you get more information. Silver pooled together the predictions from over 20 polls into one larger and more accurate prediction for the presidential election. He also points out in his book how new information can be used to update our own predictions.

  2. There’s a lot we don’t know, but don’t let that stop you. When we get information and then make decisions, there’s always a chance we’re going to be wrong because we don’t know everything. Many people, including Silver, call this uncertainty. We have to learn to live with uncertainty, and make the best decision possible.

  3. Be aware of your own bias. Gallup didn’t recognize that their polling techniques led to errors in their own predictions. Now they have to regroup in the wake of Silver’s successes. We need to be open to the information that’s in front of us and be aware of what information we may not be getting.

  4. Humility leads to better decisions. If we are humble, we will be aware of any unintended biases and we will recognize the uncertainty before us. As it turns out, this is the best anyone can do in making decisions.
So if we’re honest with ourselves and the information we are gathering, we can make better decisions and learn from any missed predictions. In life, we have to be willing to learn, try, and learn again.

If you’re interested in learning more about Silver’s take on statistical reasoning, I would highly recommend reading his new book. I received the book as a Christmas gift from my wife, and I’m glad I got the chance to read it.

Question: Have you read The Signal and the Noise? If so, what did you learn from Silver’s book? You can leave a comment below.

60 Minutes aired a piece last night about scientific fraud at Duke University, where data was fabricated in order to support alleged discoveries in individualized cancer therapies.  As a result of these investigations, a number of previously published scientific articles have been retracted.

Less than a week ago, I highlighted an infographic from Jen Rhee about the alarming statistics in science fraud.  I’m really disheartened that such a highly visible example came up so quickly… 

In the 60 Minutes piece, it seems clear that the fraud came from one scientist, Dr. Anil Potti, but there were some checks and balances that weren’t in place that created the circumstances.  When the research was published, many labs tried to reproduce the results, and two researchers at the University of Texas, Kevin Coombes and Keith Baggerly, began analyzing Dr. Potti’s data to verify his results.  What they found could only be explained through deliberate manipulation of the data, starting off a chain of events that led to retractions from Duke researchers, suspension of grants, and the eventual suspension of Dr. Potti from the Duke staff.

It took a dedicated newsletter, The Cancer Letter, to discover that Dr. Potti even falsified his own credentials, stating that he received a Rhodes Scholarship when he in fact did not, and trigger a thorough examination.  Unfortunately, Duke did not have enough institutional checks in place to catch this on their own. 

It was nice to see that the primary researcher in charge of the lab, Dr. Joseph Nevins, came out and took responsibility to the episode.  When Dr. Nevins was asked, after reviewing the original data to see if it had been fabricated, he said it was “abundantly clear” that it had.

Look – people make mistakes, even scientists when they are trying to analyze data and draw conclusions.  The scientific process is all about trying to find the truth, and being willing to accept the truth, even if it’s different than you’d like the truth to be.

But, as I said in my previous post on this type of fraud,

Real scientists… care about what the data is actually saying and discovering the truth.  When someone cares about something else other than the truth (money, celebrity, fame, etc.), then bad science is what you get.  Of course, when there are people involved, sometimes the truth isn’t the top priority.

The real tragedy is that people were affected and possibly harmed as a result of this fraud.   The fabricated data was used to validate a theory, which led to medical therapies that went through clinical trials, meaning that real people could have been given medicine that very well may have done harm to them.  As Dr. Coombes said during the 60 Minutes piece:

… you would be giving patients drugs that would definitely not benefit them.  So there’s clear potential for harm there.

Bad science should be rooted out, and good science needs to be advocated everywhere.  The truth is important and worth finding…

 

Stephen Wolfram is doing it again.  I’m a big fan of Wolfram (you can read some of my other posts here, here, and here…), and am always intrigued by what he comes up with.  A couple of days ago, Wolfram launched his latest contribution to data science and computational understanding – Wolfram|Alpha Pro

Here’s an overview of what the new Pro version of Wolfram|Alpha can provide:

With Wolfram|Alpha Pro, you can compute with your own data. Just input numeric or tabular data right in your browser, and Pro will automatically analyze it—effortlessly handling not just pure numbers, but also dates, places, strings, and more.

Upload 60+ types of data, sound, text, and other files to Wolfram|Alpha Pro for automatic analysis and computation. CSV, XLS, TXT, WAV, 3DS, HDF, GXL, XML…

Zoom in to see the details of any output—rendering it at a larger size and higher resolution.

Perform longer computations as a Wolfram|Alpha Pro subscriber by requesting extra time on the Wolfram|Alpha compute servers when you need it.

Licenses of prototying and analysis software go for several thousand dollars (Matlab, IDL, even Mathematica) - student versions can be had for a few hundred dollars, but you can’t leverage data science for business purposes on student licenses.

Wolfram|Alpha Pro lets anyone with a computer, an internet connection, and a small budget to leverage the power of data science.  Right now, you can get a free trial subscription, and from there, the costs are $4.99/month.  This price is introductory, but it could be sedutive enough to attract a lot of users (I’ve already signed up – all you need for the free trial is an e-mail address…)

One option that I find really interesting is Wolfram’s creation of the Computable Document Format (CDF), which interactivity lets you get dynamic versions of existing Wolfram|Alpha output as well as access to new content using interactive controls, 3D rotation, and animation.  It’s like having Wolfram|Alpha is embedded in the document.

I had attended a Wolfram Science Conference back in 2006 and saw the potential for such a document format back then.  There were a number of presenters who later wrote up their work into a paper, published by the journal Complex Systems.  Since many of the presentations utilized a real interactivity with the data, I could see where much of the insight would be lost when people tried to write things down and limit their visualizations to simple, static graphs and figures.

I remember contacting Jean Buck at Wolfram Research, and recommending such a format.  Who knows whether that had any impact, but I’m certainly glad to see that this is finally becoming a reality.  I actually got the opportunity to meet Wolfram at the conference (he even signed a copy of his Cellular Automata and Complexity for me… – Jean was kind enough to arrange that for me – thanks, Jean!)

If you’re interested in data science and have a spare $5 this month, try out Wolfram|Alpha Pro!

Bad Science

2012/02/07 — 1 Comment

Jen Rhee has done some great homework on bad science and put them into a cool infographic that’s worth looking at.  Here are some of the highlights from her research into bad science:

  • 1 in 3 scientists admit to using questionable research practices
  • 1 in 50 admits falsifying or fabricating data outright
  • Among biomedical researcher trainees at UC-San Diego, 81% said they would modify or fabricate results to win a grant or publish a paper

This is obvious disturbing, and worth highlighting to try and root these things out.  Science is about finding the truth – no matter what it is – and as more businesses start using data science in order to drive business outcomes, we need to make sure that science is about being honest – with the truth and with ourselves.

The scientific method was developed to provide the best way to figure out what the truth is, given the data we’ve got.  It doesn’t make perfect decisions (no method can), but it’s the best method available.

Real scientists (the ones not highlighted in Jen’s research) care about what the data is actually saying and discovering the truth.  When someone cares about something else other than the truth (money, celebrity, fame, etc.), then bad science is what you get.  Of course, when there are people involved, sometimes the truth isn’t the top priority.

Great infographic, Jen!  You can find it here

Here are some interesting data science nuggets that I thought were interesting for a mid-January day…

The first comes from TechMASH about data science being the next big thing.  The primary nugget of note is that the supply of employees with the needed skills as data scientists – those people who really understand how to pull relevant information out of data reliably – is going to have a tough time meeting demand.  Here’s an interesting infographic on the current disconnects – for example, while 37% of “business intelligence” professional studied business in school, 42% of today’s “data scientists” studied computer science, engineering, and natural sciences.  This highlights the increasing demand for students that have solid mathematics backgrounds – it’s becoming more about knowing how you pull information from data, regardless of application.

Don’t get me wrong – to be effective applying data science, you need two things:  a subject matter expert that understands what makes sense and what doesn’t, and someone who really understands data to pull out the information.  Sometimes that can reside within one person, but it’s rare and takes many years of training to acquire the necessary excellence in both fields.   And as the demands for data analysis grow, these two areas will likely form into distinct disciplines with interesting partnership opportunities being created.

The definition of data science is still being defined, but I’m convinced it will have huge impact in the next five years.  And while the science aspects of data are starting to be defined, the engineering aspects of data and analytics are truly in their infancy…

On the same thread, here’s a Forbes article by Tom Groenfeldt on the need for data scientists, or Excel jockeys, or whatever they will be called in the future.  For some companies, the move to “data science” is quite apparent, but for others, the current assemblance of business professionals that have figured out the ins-and-outs of Excel spreadsheets work quite well.  This is likely a snapshot of where things are today, but I do believe that as the questions we ask of the data get more complicated, we will clearly see the need for a more rigorous science-based discipline to data wrangling…

The last tidbit is from the Wall Street Journal about the healthcare field being the next big area for Big Data.  I do think that healthcare is ripe for leveraging data, and I’ve written other posts on the subject.  One former Chief Medical Officer that I spoke with mentioned that one of the big problems is just getting the data useable in the first place.  He said that, as of today, 85% of all medical records are still in paper form.  The figure seems a bit high to me, but I don’t really know how many patient records in various individual doctor’s offices are still sitting in folders on shelves. 

There has been a big push lately, spurred by financial support from the U.S government, for upgrading to electronic health records (EHR).   This will help to solve the data collection problem – if you can’t get data into an electronic format, you can’t utilize information technologies to pull information out of the data.

I ran across this article from the Independent today about the impacts of data algorithms, the ethics of data mining, and the future of our lives in an automated, data-crunching world.  Below is a quote from the article by Jaron Lanier, musician, computer scientist and author of the bestseller You Are Not a Gadget.

Algorithms themselves are a form of creativity. The problem is the illusion that they’re free-standing. If you start to think that information isn’t just a mask behind which people are hiding, if you forget that, you’ll pay a price for that way of thinking. It will cause you to be less creative.

If you show me an algorithm that dehumanises, impoverishes, manipulates or spies upon people,” he continues, “that same core maths can be applied differently. In every case. Take Facebook’s new Timeline feature [a diary-style way of displaying personal information]. It’s an idea that has been proposed since the 1980s [by Lanier himself]. But there are two problems with it. One, it’s owned by Facebook; what happens if Facebook goes bankrupt? Your life disappears – that’s weird. And two, it becomes fodder for advertisers to manipulate you. That’s creepy. But its underlying algorithms, if packaged in a different way, could be wonderful because they address a human cognitive need.

I think this is a really great read for anyone who’s interested in data, algorithms, and their impact on society – there’s a lot of really good stuff to take in.  You can read the entire article here