William's blog: GFS(Google File System)

GFS is a scalable distributed file system, used in large-scale, distributed, a large amount of data access application. It runs on cheaper hardware, but can provide fault tolerance function. It can give a large number of users to provide overall high performance service.

GFS is the Google File System, Google company in order to store massive data and design of the special file system.

1, design overview

( 1) design scenario

GFS and past distributed file system has many of the same goals, but the GFS design by the current and expected application of the workload and technical environment of the drive, which reflects its early file system with distinct ideas. This requires the traditional selection for re-inspection and completely different design perspective.

GFS and before the file system views are as follows:

The 1 part is no longer as abnormal, wrong, but as a common condition to be treated. Because the file system is composed of hundreds to thousands of used for storing machine composition, and these machines are made of inexpensive ordinary components and a large number of guest client access. The number of components and quality makes some machines can not work and some also may not be able to recover. So real-time monitoring, fault detection, fault tolerant, automatic recovery system is essential.

In 2, according to the traditional standard, file is very large. Length up to several GB file is very common. Each file usually contains many application object. When often to deal with rapid growth, including tens of thousands of objects, the length of the TB data set, we very difficult to manage tens of thousands of KB scale file block, even if the underlying file system support. Therefore, the design of operation parameters, the size of the block must reconsider. For large file management must be able to achieve efficient, for a small file must also support, but does not have to optimize.

3, most file update is performed by adding new data, rather than a change in the existing data. In a document of the random operation in practice almost does not exist. Once finished, the file is read only, many data have these characteristics. Some data may comprise a large warehouse for data analysis procedures and description. There is a program in execution of continuous data streams generated. Some archives nature of the data, some are in a machine to produce, in addition to a machine for processing the intermediate data. Because of these large file access method, add operation become the performance optimization and ensuring atomicity of focus. While in the client cache data block is lost their appeal.

In 4, the workload is mainly composed of two components: read operation on large amounts of data stream mode read operations and to a small amount of data in random mode read operations. Before a read operation, may want to read a few hundred KB, usually up to 1MB and more. From a customer's continuous operation is usually read a contiguous area. Random read operation is usually in a random offset read several KB. Performance sensitive applications will usually on a small amount of data read operation for classifying and batch so that the read operation and steady advance, and do not let it to and fro reading.

5, work also contains many large amounts of data, continuous, add data to the file write operation. Write data and read similar scale. Once finished, the file is little changed. In a random location on a small amount of data write operation is also supported, but not very efficient.

6, the system must realize definition intact a large number of customers at the same time to the same file add operational semantics.

( 2) the system interface

GFS provides a similar to the file system interface, although it did not achieve the standard API to POSIX. File in the directory according to the level of organized by path name and logo.

( 3) the system structure

A GFS cluster consists of a master and a chunkserver form, and by many customers to access ( Client ). As shown in figure 1. Master and chunkserver are usually run user layer service process of Linux machine. As long as the resource and reliability allows, chunkserver and client can run on the same machine.

The file is divided into blocks of fixed size. Each block by a constant, globally unique 64 digit Chunkhandle logo, Chunkhandle in block created by master distribution. ChunkServer will block as Linux files stored on a local disk and can be read and written by Chunkhandle and the range of the specified data. For reliable considerations, each block is copied to multiple chunkserver. By default, the preservation of 3 copies, but it can be specified by the user.

Master file system so the meta data ( metadata ), including the name space, access control information, from the document to the mapping block and the block 's current position. It also controls system range of activities, such as block lease ( lease ) management, orphan piece of garbage collection, chunkserver inter block migration. Master regularly through the HeartBeat message with each chunkserver communication, chunkserver transfer instruction and collect its state.

Application of the GFS associated with each customer code implementation of the file system API and master and chunkserver communication to represent the application program to read and write data. The client and master exchange only on metadata ( metadata ) operation, all data communication are directly in contact with chunkserver.

Passenger door and chunkserver are not cached file data. Because the user buffer benefits very little, this is because the data is too much or the working set is too large to cache. Not to cache data simplified the client programs and the whole system integration, because there is no need to consider the cache coherence problem. But the user cache metadata ( metadata ). Chunkserver also does not need to cache files, because when the block as a local file storage.

William's blog

Friday, September 2, 2011

GFS(Google File System)

No comments:

Post a Comment