We hear a lot about Big Data these days. For many people, “big data” means a flood of data, but what exactly is it? What is the difference between “a lot” of data and “big data”? According to Gartner, information becomes big data when the volume can no longer be managed with normal database tools. Let’s take a fresh look at some of the interesting facts and finding about Big Data.
Data is everywhere
The digital universe will grow from 3.2 zettabytes to 40 zettabytes in only six years. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few (SOURCE). The volume of data created by U.S. companies alone each year is enough to fill ten thousand Libraries of Congress (SOURCE).
Zuckerberg noted that 1 billion pieces of content are shared via Facebook’s Open Graph daily (SOURCE). Facebook puts up over 10 million photographs every hour and around 3 billion ‘like’ buttons are pushed everyday (SOURCE). Google process more than 24 petabytes of data every day (SOURCE). 48 hours of video are uploaded to YouTube every minute, resulting in nearly 8 years of content every day (SOURCE). 70% of data is created by individuals – but enterprises are responsible for storing and managing 80% of it (SOURCE).
88% data is ignored
According to a recent study by Forrester Research, most companies analyze a mere amount of 12% of the data they have. Perhaps, these firms might be missing out on data-driven insights hidden inside the 88% of data they’re ignoring. A lack of analytics tools and “repressive” data silos are two reasons companies ignore a vast majority of their own data, says Forrester, as well as the simple fact that often it’s hard to know which information is valuable and which is best left ignored. @SOURCE
Structured vs Unstructured Data
While classifying data, Tata Consultancy Services Limited (TCS) has looked at how much of companies’ data was structured versus unstructured, as well as how much was generated internally versus externally. It found that 51% of data is structured, 27% of data is unstructured and 21% of data is semi-structured. A much higher than anticipated percentage of data was not structured — either unstructured or semi-structured and a little less than a quarter of the data was external. @SOURCE
Booming of jobs, but lack of talents
By 2015, 4.4 million IT jobs globally will be created to support big data, generating 1.9 million IT jobs in the United States. Every big data-related role in the US will create employment for three people outside of IT, so over the next four years a total of 6 million jobs in the U.S. will be generated by the information economy. The challenge? There’s not enough talent in the industry. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. @SOURCE @SOURCE
According to Dice, a career site for tech and engineering professionals, job postings for NoSQL experts were up 54% year over year, and those for “big data talent” rose 46%, the site reported in April. Similarly, postings for Hadoop and Python pros were up 43% and 16%, respectively. @SOURCE
Big Data means big money
According to Modis, a global IT staffing services provider, data scientists remain in “high demand but short supply,” which translates into generous six-figure salaries for some PhDs with relevant big data experience. @SOURCE
According to a study by Burtch Works, the base salary for a staff data scientist is $120,000, and $160,000 for a manager. The estimates are based on interviews with more than 170 data scientists from a Burtch Works employment database. @SOURCE
However, Data Scientists Salary Survey shows that data scientists salaries in Europe and Asia are significantly lower. @SOURCE
Quality of Data
More than half of IT leaders (57%) and IT professionals (52%) report they don’t always know who owns the data. If one doesn’t know who owns the data, there is no one to hold accountable for its quality. As different sources and varieties of data are fused together for big data projects, ensuring the accuracy and quality of the data will be critical to success. @SOURCE
Big Data drives software growth
In its latest Worldwide Semiannual Software Tracker, International Data Corporation (IDC) predicts that the worldwide software market will grow 5.9% year over year in current US dollars (USD). IDC believes that the compound annual growth rate (CAGR) for the 2013-2018 forecast period will remain close to 6%. The average 2013-2018 CAGR for Asia/Pacific (excluding Japan), Latin America, and Central Eastern, Middle East, and Africa (CEMA) is 8.5% while the average CAGR for the mature regions – North America, Western Europe, and Japan – is 5.9%. @SOURCE
Visualization is in demand
Visualization is hot because it makes data-analysis easier. According to InformationWeek Business Intelligence, Analytics and Information Management Survey, nearly half (45%) of the 414 respondents to a poll cited “ease-of-use challenges with complex software/less-technically savvy employees” as the second-biggest barrier to adopting BI/analytics products. The online dating giant Match.com started using Tableau Software because it wanted to put analysis capabilities “in the hands of our users, not elite analytics or BI experts.” @SOURCE
To read more interesting facts about Big Data, click here.