I was thinking about starting to write a series of posts on comparing Google Apps and Microsoft's rival - Office 365.

I am sure you can find a lot of information doing some basic searching, however, most of it will be pure marketing buzz with few technical details attached to it. In contrast, I was thinking about writing a pure technical articles which will help professional CIO's and CTO's to decide what product better matches their needs.

So, today, I would like to start with elaborating on datacenter location aspect and to see how both products deals with it. Why, data location? Well, that's because this is the top differentiator between Google and Microsoft when it comes to cloud computing.

As a consultant, the question I am being asked all the time when meeting customers on cloud technologies is "Where is my data stored?". This is not by a chance, but because people are not comfortable with the idea that their data is stored outside their control.

Multi-Tenant Design (Google's Approach)

Google basically invented the cloud more than a decade ago. They were the first company who needed to deal with extremely large volume of data that no company has required to do before. I see that as a primary reason that they developed a complete set of technologies which are aimed to be a solid infrastructure for their business.

Within Google's datacenters it's all about Google's technologies. It's Google's servers (did you know that Google is 4th server manufacturer in the world after HP, IBM and Dell?), it's BigTable database, it's their own storage technology, security software and so on. Google has to develop those technologies because there was nothing available off-the-shelf at that time (even today there are very few companies who have built something similar, take Facebook as one of few examples) to support such a big volume of processing and data.

So how it really works? Each piece of data that you're storing on Google's infrastructure (it can be an email, file, calendar entry or whatever) is being split into small chunks of data and being distributed over large amount of servers spreading multiple datacenters. Why? Data security..

Let's take as a axiom that everything can be hacked (and there were many examples of that in the past like Pentagon, NASA and other highly secured premises). So, Google is also can be hacked (even if it's really hard to achieve given the scale of security operations in Google). In that case, Google need to prevent data leaking even in the case of server or complete datacenter is compromised.

By splitting your data into very small chunks and distributing it's over really large number of servers and datacenters, the potential hacker will only be available to get a small, obfuscated portion of your data. Basically, it's will give the hacker nothing..

In addition to splitting the data, Google replicates it across many additional datacenters to achieve high availability of your data. The data is being replicated across geographically dispersed locations to make it even more secure. After all, Google's effective SLA for 2011 is 99.998%.

So, what do I answer my customers when they ask me about where their's data is stored? My answer - It's stored nowhere and everywhere simultaneously.

Off course, storing the data in that much locations solves many other issues. Take for example the latency issue. Latency is cloud's primary counterargument. As a customers, we all want to access our services and data fast as it is when it's internal in our LAN.

By distributing the data, Google's customers are always working with their closest servers, thus minimizing the latency. Imagine a global company with multiple locations all over the world. They cannot "host" their data in some single location, doesn't matter what it will be. Someone will always suffer.

In Google's cloud, if you're American employee - you will work with Google's datacenter in US. If you're connecting from Europe, well again, - you will be working with European's datacenters. The same is also truth for other continents. So, everyone is getting his lowest possible latency.

Look for my next post on how Microsoft deals with data location. Stay tuned.
Posted
AuthorVadim Solovey