I am starting to write a series of "Big Data and Cloud Storage" on this blog, sponsored by Cloudian, who provides the cloud storage software. The first of the series is the historical trend of data explosion and the need for cloud storage.
Era of Data Explosion and Big Data
- Analog Data and Digital Data
Humankind has accumulated a dazzling pile of analog data in its history over thousands of years. From Buddhist scriptures and printed Gutenberg bibles to enormous amount of modern-day books, photos, music and videos, it still continues to get accumulated day by day.
Digital data is not far behind. Now that major media and personal communications all turned into digital format, digital data is imploding at seams.
So here is a question. Which do you think is the bigger data, analog or digital?
The answer depends on “when”. Consulting firm McKinsey published a report on “big data” in May 2011, and in it, they show an estimate of the share of digital among all the accumulated data. In 2000, thousands-of-year-old analog scores 75% of total. In 2007, however, digital overwhelms analog by 94% share, surpassing analog in mere 7 years.
Digital technology started off in the 80’s with personal computer invention, and by the time of Net Bubble in the 90’s, most of media, such as mail, photo, music and video were in digital format. Yet, in the 2000, we had way more analog, but in 10 years after that, digital exploded as such. How, then, did it happen?
- Bubble Burst and User Generated Contents
In the 1990’s, e-mail emerged as an alternative of snail mails. Then came e-commerce as a catalog alternative, and news portal as a newspaper/magazine alternative. Back then, transmission speed and technology was still limited, so a relatively small number of providers were producing catalog and articles, and delivering these contents to users through Net in an unilateral manner.
Throughout the bubble period, huge scale of Internet infrastructure was built, but after the bubble burst in 2000, demand suddenly shrunk and price of over-supplied fiber optics and datacenter plummeted.
Sometime later, a new type of Internet companies rose from the ashes of bubble, such as Google and Facebook. These new species of Web industry was later named “Web2.0”. They provide “interactive” flow of information on the Net, created platforms for “user generated contents” and revolutionalized the net business. They are not the alternatives of something, but are totally unique to the Web technology and had totally different cost structure. People started to share their thoughts and photos on blogs, and videos on YouTube. And all these user generated contents have been published and accumulated on the Internet.
- Data gathers on “cloud” and becomes “brain”
Google’s then-chairman Eric Schmidt uttered the word “cloud computing” in 2006 in a speech, popularizing the term “cloud”. Cloud computing means the system to keep data and application in Internet, rather than on desktop computer. The term “cloud” came from the “cloud” figure on the network chart to express Internet. Such idea has already been advocated in the past, but around this time, finally came true, as the network environment caught up with broadband penetration.
As data transformed from analog to digital, and gets published on the cloud, now we can easily gather many different kinds of data in the cloud, sort it and extract meaning from it. Starting off as a monad of individual computers in the 80’s, they get connected with nerves of Internet in the 90’s to form a earthworm, and in the 2000’s evolved into human brain.
And this highly intelligent brain activity on the Internet is called “big data”. The more information is stored, the better the brain works, and as the brain works well, it gets more and more interesting to learn the new things, so the brain autonomously and increasingly sucks in the new data.
In summary, digital data explosion and the subsequent trend towards big data was triggered by Web industry’s movement into “cloud”.