Data Science

Even doctors will be Data Scientists

23rd Jun `15, 06:37 PM in Data Science

We all know how it works. You walk into a doctor’s office complaining about some pain in your…

Rob Thomas
Rob Thomas Contributor

We all know how it works. You walk into a doctor’s office complaining about some pain in your leg or otherwise. They take your temperature, get you on the scale, check your blood pressure, and perhaps even get out the rubber hammer. These measurements are simply snapshots at one particular instant in time and may be subject to error. This limited dataset fails to capture temporal variations or the many other important factors that are required to assess the patient’s health status. After reviewing the few measurements collected, the consultation between the patient and doctor begins. Based on the rudimentary physical analysis, along with the discussion with the patient, the physician will assert the condition that they believe is present, followed by a recommended treatment.

This approach, which is common throughout the world, is much more based on instinct and gut feeling than a scientific approach to analyzing data. Accordingly, it seems that most decisions are made based on the opinion of the physician instead of a data-proven truth. This type of opinion-based medicine is a problem in both doctor-patient care and in medical research. This is a symptom of a lack of data, as well as years of training physicians to perform without complete data.

The data collected in a typical office visit is only a fraction of the data that could be collected if health were viewed as a data problem. And, if health were redefined as a data problem, physicians would likely need different skills to process and analyze the data.


Vinod Khosla is one of the most successful venture capitalists in the history of Silicon Valley. He was an original founder of Sun Microsystems, and has since gone on to finance a variety of start-up companies as a venture capitalist. While he is not a medical expert, he is a data expert. In his speech at Stanford Medicine X, Khosla highlights three major issues in medicine today:

1) Doctors are human: Doctors, like everyone else, have cognitive limitations. Some are naturally smarter than others or have deeper knowledge about a particular topic. The latter leads to biases in how they think, act, and prescribe. Most shockingly, Khosla cites that doctors often decide on a patient diagnosis in the first 30 seconds of the observation. Said another way, they base their diagnosis on a gut reaction to the symptoms that they can see or are described to them.

2) Opinions dominate medicine: Khosla asserts that medicine is much more based on opinion than data. He cites the Cleveland Clinic Doctors’ Review of Initial Diagnosis study, asserting that Cleveland Clinic doctors disagree with initial diagnoses 11 percent of the time. In 22 percent of cases, minor changes to treatment are recommended. And in a startling 18 percent of cases, major changes to treatment are recommended. As Khosla states, “This means it’s not medical science.”

3) Disagreement is common among physicians: Doctors disagree a lot. It’s so dramatic, that, Khosla states, “whether or not you have surgery is a function of whom you ask.”

Medicine is currently a process of trial and error, coupled with professional opinion.


The Data era in medicine will be defined by a shift from intuition and opinion to data. We can collect more data in a day now than we could in a year not too long ago. Collecting data and applying it to solve healthcare problems will transform the cost and effectiveness of medicine. The question is how quickly we can get there.

Medical schools must evolve as technology advances. Most advancement in medical schools, based on technology, have been focused on utilizing advanced tools and equipment, as opposed to addressing the core knowledge needed by a physician in the data era.

The curriculum for the first two years of medical school varies by school, but it is heavy on the sciences, the human body, and the human condition. This has been typical since the first medical schools in the 1200s. All this time, investment and history, yet the newly minted physician is unprepared for practicing in the data era.

The data era requires an augmentation in curriculum to include key skills required for data-based analysis:

*Data Analysis and Tools

The skills of physicians will necessarily evolve in the data era, and that has to begin in medical schools. This focus will expedite the move away from opinion-based medicine to a future that the ill prefer: prescriptions based on hardened data analysis.


This week, IBM is announcing a set of tools, technology, and processes to bring data science to the masses. Said another way, armed with IBM technology, everyone is a data scientist. We are democratizing the access to data in your organization.

Every organization sees Hadoop as providing an open-source, rapidly evolving platform that is capable of collecting and economically storing a large corpus of data, waiting to be tapped. Yet, most organizations are not yet fully realizing the value of Hadoop due to the lack of skilled data scientists and developers to extract valuable insight. IBM will make everyone a data scientist. We take the first steps this week by:

1) Introducing new modules for In-Hadoop analytics including SQL, Machine Learning, and R.

2) Confirming our commitment to open source with IBM BigInsights Open Platform with Apache Hadoop, to include new innovations like Apache Spark. We are excited to be a founding member of the Open Data Platform.

3) Rolling out expanded data science training for Machine Learning and Apache Spark via BigDataUniversity. Today, over 230,000 professionals and students are being trained at BigDataUniversity and we are on our way to 1 million trained.


We all look forward to how things will be in 15 years. You walk into a doctor’s office, and the physician immediately knows why you are there. In fact, she had discussed some data irregularities that she had spotted at your annual physical exam, six months prior. She doesn’t need to take your temperature, as she receives that data direct from your home every day. You also take your own blood pressure monthly and that is transmitted directly to your physician. Instead, the discussion immediately turns to the possible treatments, along with the probability of success with each one. Recent data from other patients with a similar history and physiology indicate that regular medication will solve the issue 95 percent of the time. With this quick diagnosis, involving no opinions, you are on your way after ten minutes, confident that the problem has been solved. This is medicine in the data era, administered by a physician steeped in mathematics and statistics. In the data era, even doctors become data scientists.

This post is adapted from my book Big Data Revolution: What farmers, doctors and insurance agents teach us about discovering big data patterns. Find more on the web at BIG DATA REVOLUTION.