William's blog: BigTable

Brief introduction

BigTable is a relation database, is a sparse, distributed, persistent multidimensional sorted Map storage. Bigtable is designed to reliably handle PB level data, and can be deployed to thousands of machines. Bigtable has achieved the following objectives: wide adaptability, scalability, high performance and high availability. Bigtable has been in more than 60 Google products and projects to be applied, including Google Analytics, GoogleFinance, Orkut, Personalized Search, Writely and GoogleEarth. These products on the Bigtable put forward different requirements, some require high throughput batch, others require a timely response, quick return data to the end user. They use the Bigtable cluster configuration also has very big difference, some clusters of only a few servers, while others require thousands of servers, storage of hundreds of TB data.

Function

In many ways, the Bigtable and database is similar: it uses a lot of database implementation strategy. Parallel database [ 14 ] and [ 13 ] memory database has scalability and high performance, but Bigtable offers a completely different interface and the system. Bigtable does not support the integrity of the relational data model; on the contrary, Bigtable offers customers a simple data model, using this model, the customer can dynamically control the distribution of data and format ( Alex note: also on the BigTable, the data is not format, database field terms, is that the data does not have Schema, the user to define Schema ), users can speculate ( Alex: reasonabout ) the underlying storage location data correlation ( Alex note: position correlation can understand so, such as a tree structure, with the same prefix data storage location close to. In reading time, can use these data was read out ). Data index is the row and column name, name can be any string. Bigtable will store the data as a string, but Bigtable itself is not to analyze these strings, the client will usually be in all sorts of structured or semi-structured data serialization to these strings. By carefully choosing the data model, the customer can control the location correlation data. Finally, through the BigTable model parameters to control the data is stored in the memory, or hard disk.

Characteristic.

1, suitable for massive data, PB data;

2, distributed, concurrent data processing, high efficiency;

In 3, easy to expand, to support the dynamic extension;

4, applicable to inexpensive equipment;

5, suitable for a read operation, not suitable for write operation.

6, do not apply to the traditional relation database;

Application:

BigTable is Google's search, map, finance and economics, print, and social networking sites, video sharing site Orkut YouTube sites and blogs Blogger business to provide technical support.

In 2010 September, Google announced that it would give up MapReduce new index system will be moving to the BigTable platform. The new platform based on the Colossus, also known as GFS2.

William's blog

Thursday, December 22, 2011

BigTable

No comments:

Post a Comment