“Big Data and Cloud Storage” Trend 2: “Big Data and Cloud” Vol. 3
What is Cloud Storage?
◆ Memory of the cloud brain
In my previous article, I wrote that “cloud” is becoming the "brain" of the Internet world and its “thinking” activities correspond to “big data”. This time, I will talk about another brain function “memory”, which is “cloud storage”. The word “STaaS (Storage as a Service)” is used interchangeably.
Dropbox is an easy-to-understand example. To be precise, Dropbox is an end user application and cloud storage is an infrastructure for applications, but consider it as a metaphor to understand its role.
Documents are stored in the Dropbox server in the cloud. It gained popularity as the document sharing tool between the desktop and mobile devices, as a part of the web world transition to "mobile and cloud" era, as I mentioned in the first article. It is also used as a groupware to share files team members, and similar service Box is widely used by enterprise users.
These are particularly storage-centered services, but virtually all web services need storage, such as mail storage in Gmail and photo storage in Facebook
◆ “Kanban sysytem” cloud storage
Cloudian distinguishes Dropbox-like upper later file share as “online storage” and lower layer infrastructure for application as “cloud storage” for app providers. The following discussion is about the latter.
Major players such as Facebook and Google own and operate in-house storage infrastructure. However, many other online service providers strategically choose to outsource it. The major online movie streaming provider Netflix, who owns a huge amount of video and customer data, is a good example of such “cloud storage”.
Specialized consulting firm 451Group forecasts global market of cloud storage grows to $ 6.0 billion in 2015 from $ 1.3 billion in 2011. Majority is the storage-centric services ($750M → $4.7B), with backup and archiving ($550M → $1.3B) consist the rest.
451 Group defines cloud storage with two factors as follow;
1) Storage capacity can be obtained in on-demand basis.
2) Data is in a hosted environment and can be accessed via Internet.
If data amount drastically fluctuates from time to time, it is too expensive to own the storage capacity enough for the peak time, like an empty highway in the countryside. Instead, cloud storage (STaaS) can work as the Kanban system. Among the above two items, (1) is the major characteristic of cloud storage, whereas (2) is also for a traditional hosting service. This Kanban-like scalability is called "scale-out” in the cloud industry.
As mentioned in my last article, Amazon is the giant in this world. There are practically no start-ups inSilicon Valleywho don’t use the Amazon cloud service. Amazon’s cloud storage is ideal for them, as it is hard to predict the capacity requirement over time and the budget is tight.
Amazon customers include some large enterprises like Netflix, as well as those start-ups, and it is the only cloud storage vendor that their annual revenue exceeds $100M. In the 451 report, Amazon owns almost 50% market share, although there is no exact data available at hand. Salesforce.com, Rackspace, Microsoft and HP are followers.
◆ Storage system of Amazon
Amazon’s cloud storage S3 (Simple Storage Service) is a part of Amazon Web Services (AWS). “S3” has becomes de facto standard of cloud storage.
S3 uses the technology called Object Storage, one of the three storage methods:
(1) Block Storage:
Data is cut into a certain size, and mechanically stored as 1s and 0s. It is used in SAN (Storage Area Network) that requires fast access over a very short distance.
(2) File Storage:
A collection of data is stored in file format, carries metadata such as file name and file format, in a hierarchical structure of directory or folder, much like on the PC desktop. It is used in NAS (Network Attached Storage).
(3) Object Storage:
A big chunk of data is packaged like a box, including metadata, which is called an object. Each box is given an OID (Object ID), and all objects are saved in a flat manner.
File storage is easy to understand by analogy with the paper folders, but is inefficient due to several problems. The data access operation requires following the folder structure from the top to the bottom, and needs to go back to the top to move to a different folder. Metadata is located outside of a holder, and concurrent operation is problematic because the name of the upper folder is shared by multiple files
In contrast, with object storage, OID is the only key necessary to access an object, much like pulling out a whole box by looking at a tag attached to it. It is not necessary to go up and down the hierarchy and all metadata is also stored in a box.
Only one object is tied to one OID, so parallel data accessing is easy. This higher efficiency results in lower cost and high scalability, as long as the contents of the box are not changed.
With these characteristics, object storage is a preferred method for cloud storage which requires storing massive static data, such as images, videos and e-mails, and cost efficiency and scale-out ability are quite important.
Not many players challenge to the dominance of Amazon at the moment. In theUS, some companies such as Microsoft and HP serves their existing enterprise customers, slightly different customer base. Google is sometimes mentioned as a direct competitor to Amazon, but their target is small and medium-sized customers and their market share is still small. InEurope, LunaCloud has emerged as an Amazon style competitor.
InJapan, Nifty Cloud and Yahoo! Cloud have been providing similar services, and recently NTT Communications entered this field. Please see below for more details.