GFS Architecture

Google organized the GFS into clusters of computers. A cluster is simply a network of computers. Each cluster might contain hundreds or even thousands of machines. Within GFS clusters there are three kinds of entities: clients, master servers and chunkservers. In the world of GFS, the term "client" refers to any entity that makes a file request. Requests can range from retrieving and manipulating existing files to creating new files on the system. Clients can be other computers or computer applications. You can think of clients as the customers of the GFS. The master server acts as the coordinator for the cluster. The master's duties include maintaining an operation log, which keeps track of the activities of the master's cluster. The operation log helps keep service interruptions to a minimum -- if the master server crashes, a replacement server that has monitored the operation log can take its place. The master server also keeps track of metadata, which is the information that describes chunks. The metadata tells the master server to which files the chunks belong and where they fit within the overall file. Upon startup, the master polls all the chunkservers in its cluster. The chunkservers respond by telling the master server the contents of their inventories. From that moment on, the master server keeps track of the location of chunks within the cluster. There's only one active master server per cluster at any one time (though each cluster has multiple copies of the master server in case of a hardware failure). That might sound like a good recipe for a bottleneck -- after all, if there's only one machine coordinating a cluster of thousands of computers, wouldn't that cause data traffic jams? The GFS gets around this sticky situation by keeping the messages the master server sends and receives very small. The master server doesn't actually handle file data at all. It leaves that up to the chunkservers. Chunkservers are the workhorses of the GFS. They're responsible for storing the 64-MB file chunks. The chunkservers don't send chunks to the master server. Instead, they send requested chunks directly to the client. The GFS copies every chunk multiple times and stores it on different chunkservers. Each copy is called a replica. By default, the GFS makes three replicas per chunk, but users can change the setting and make more or fewer replicas if desired.

Summary

Things to Remember

The minimal state is stored in the master, and state information is gathered in a more real-time fashion allowing faster addressing of faults.
large chunk sizes are used which fits ideally for the design goal of huge file communication between clients and master is kept minimal. the master just directs the clients to the appropriate chunkservers. This improves scale.
chunk replication and chunk checksumming improve reliability and fault tolerance.
while operation log is maintained on the master, checkpointing of the same is done to improve performance and minimize replay overhead.
Master is a single point of failure in their design. While replication of mater is available, the model doesn't look like a hot-standby model so that downtime seems to exist.
The consistency model looks flaky to me. A lot seems to be left to the applications to figure out.
The filesystem will perform rather bad for small files, and if there's a huge volume of small files, metadata pressure on the master can build up.
The directory structuring of filenames seems rather artificial given the flat namespace.
The lazy strategy is adopted in many places to improve performance matching is employed in operation log, the chunk to chunkserver mapping, garbage collection etc to improve performance.
Copy on write model is employed for snapshots.But the main technique is chunking and distribution of chunks across the cluster.
A major tradeoff is in selecting the chunk size. Larger chunk sizes reduce metadata volume and improve master performance whereas smaller chunk sizes would have supported small filesizes better.

MCQs

No MCQs found.

Subjective Questions

Q1:

Define tetanus.

Type: Very_short Difficulty: Easy

Show/Hide Answer

Q2:

List the symptoms of tetanus.

Type: Short Difficulty: Easy

Show/Hide Answer

Q3:

How tetanus is treated ? Explain.

Type: Long Difficulty: Easy

Show/Hide Answer

Videos

No videos found.

GFS Architecture

Architecture

On a single machine file system:

An upper layer maintains the metadata
A lower i.e. disk stores data in the units called blocks

In the GFS

A master process maintains the metadata
A lower layer (ie set of chunk servers) stores data in units called “chunks”

GoogleFS cluster consists of a single master and multiple chunk servers.

The basic analogy of GFS is master , client , chunk servers.
Single GFS master and multiple GFS chunk-server accessed by multiple GFS clients
GFS Files are divided into fixed size chunks( of 64MB)
Each chunk is identified by a globally unique “chunk handle”(64 bits)
Chunk-servers store chunks on local disks as Linux file
Master maintains all file system metadata
For reliability, each chunk is replicated on multiple chunk servers (default is 3)
Clients interact with the master for metadata operations
Chunkservers need not cache file data

Master

Single process ,running on a separate machine that maintains all FS metadata

-namespace, access control, chunk mapping and locations(files-> chunks-> replicas)

Send periodic heartbeat messages with chunk servers

- instructions+state monitoring

Client

Contact master to get the metadata to contact the chunk servers (control plane)

Contact chunk servers for read-write operations (data plane)

Chunk

Similar to the concept of the block in file systems.
Compared to file systems, the size of a chunk is 64 MB.
Fewer chunks and less metadata for chunks in the master.
Property of chunk is chunks are stored in chunkserver as file, chunk handle, i.e., chunk file name
Extends only if need (avoids waste space ie no fragmentation)

why large chunk-size ?

Advantages

Reduces client interaction with GFS master for chunk location information
Reduces size of metadata stored on master (full in-memory)
Reduces network overhead by keeping persistent TCP connection to the chunk server over an extended period of time

Disadvantages

Problem in this chunk size is developing a hotspot

Metadata

Master stores 3 types of metadata:

the file and chunk namespaces (in-memory + operation log)
the mapping from files to chunks (in-memory + operation log)
the location of each chunk’s replicas (in-memory)

First, two types are kept persistent to an operation log stored on the master’s local disk.

Metadata is stored in memory, master operations are fast.

Easy and efficient for the master to periodically scan

All meta-data is kept in GFS master memory ie RAM

-Periodic scanning is used to implement chunk garbage collection, re-replication, and chunk migration

Operation Logs File

The logical time line that defines the order of concurrent operations
Contains only metadata (Namespaces + Chunk mapping )
Are kept on GFS master’s local disk and replicated on remote machines
Has no persistent record for chunk locations (master polls chunk servers at startup)
GFS master checkpoints its state (B-tree) whenever the log grows beyond a certain size

Read Algorithm

Application creates the read request
GFS client interprets the request from (filename, byte range) -> (filename, chunk, index), and sends it to master
Master responds with chunk handle and replica locations (i.e chunk servers where replicas are stored)
Client picks a location and sends the (chunk handle, byte range) request to that location
Chunkserver sends requested data to the client
Client forwards the data to the application

Write Algorithm

Application originates with request
GFS client interprets request from (filename, data) -> (filename, chunk index) and sends it to master
Master responds with chunk handle and (primary + secondary) replica locations
Client pushes write data to all locations
Data is stored in chunk servers’ internal buffer
Client sends write command to primary
Primary determines serial order for data instances stored in its buffer and writes the instances in that order to the chunk
Primary sends order to the secondaries and tells them to perform the write Secondaries responds to the primary
Primary responds back to the client
If the write fails at one of the chunk servers, the client is informed and rewrites the write.

Record Append Algorithm

Important operations at Google:

Using file as producer – consumer queue
Merging results from multiple machines in one file
Application originates record append request
GFS client translates request and send it to master
Master responds with chunk handle and (primary + secondary) replica locations
Client pushes write data to all locations Primary checks if record fits in specified chunk
If record does fit, then the primary:
- Pads the chunk
- Tells secondaries to do the same
- And informs the client
- Client then retries the append with the next chunk
If record matches/fits, the primary:
- Appends the record
- Tells secondaries to do the same
- Receives responses from secondaries
- And sends final response to the client

Consistency Model

Write – data are written at application specific offset
Record append – data appended automatically at least once at offset of GFS’s choosing (Regular Append – write at offset, client thinks is EOF)
GFS
- Applies mutation to chunk in some order on all replicas
- Uses chunk version numbers to detect stale replicas
- Garbage collected, updated next time contact master
- Additional features – regular handshake master and chunk servers, checksumming
- Data only lost if all replicas lost before GFS can react

Lesson

Google File System

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.