Wednesday, January 4, 2012

HDFS - Write Anatomy

Create and Write of HDFS file

Creation and writing of a file is more complicated than the read of a HDFS file. Here also NameNode(NN) never writes any data directly to DataNodes(DN). It, as per it's role, only manages the namespace and inodes. Client has to write directly to datanode. However each datanodes has to notify receipt of each block back to client and namenode. Also each datanode passes on the block to next datanode to write, that means client has to transmit block to only first datanode and rest of the block movement is handled inside the cluster. Here is the flow of data file create and write on HDFS.  



Other facts about HDFS write:
  • Interestingly client and NN does not wait till all the replicas of the block are acknowledged, it only ensure that at-least one copy of file is completely on the cluster.
  • Another fact is that client does not start transmitting data till it has one block with it, i.e. the DFSClient keep buffering data locally till one block of data or the end-of-file is reached.
  • The above illustration assumes that replication factor of files is set to three. Accordingly if the replication factor is more that or less than three, steps 6-10 would be repeated accordingly. BTW, the minimum replication factor or a file can be 1 and max can be 512.  However the default value is three.
  • Also this illustration is till before Hadoop version 0.23. I still have to look into the federated NN architecture.
  • While the lease is with a particular client, no other client can write to this file, or delete the file. However it can read the file. 
  • Write to the file can happen only at the end of the file, i.e. only append is allowed. That too is available only on ver 0.20.205 and beyond. The feature was there in earlier versions as well, however was not tested and disabled by default. 

There are some other concepts like block and replica management, which I will cover in my next post. 

Monday, January 2, 2012

HDFS - Read Anatomy

Following is the read anatomy of HDFS file:

                    
1. Client request the document
2. NN, checks the permissions and sends back the list of blocks and datanodes list (including port number to talk) for each block. 
3-6. "DFSClient" class on client-side picks up first block and requests the block from first datanode on the list. It tries two times and if no response then it adds the datanode to "deadnodes" list. And requests block from next datanode on the list.
7-8. After usccessful read of all the blocks, "DFSClient" send the deadnodes list back to NN for it to take action. 

I will talk about Write-anatomy in next post. Please keep reading...






Wednesday, December 28, 2011

HDFS Architecture

HDFS Architecture have three key components:

1. NameNode
2. Data Nodes
3. Secondary NameNode

NameNode is master node that controls the whole cluster and division of files into block. Typical block size is 64MB, however it can be configured using parameter which are in <HADOOP_INSTALL>/conf/hdfs-site.xml. And how many copies of each block are to be kept in cluster, is also a configurable parameter in same file.  Namenode keeps two data structures:
            1. Namespace (filename to blocks mapping)
            2. Inodes (block to datanode mapping)
Namespace and inodes are always in memory so that it can referenced quickly. However only namespace is persisted to hard disk and  on restart of cluster,  inodes are created by namenode based on the information is gets from each data node periodically. Namenode never initiates an interaction with datanode, instead datanodes keep sending heartbeat to namenode and in response namenode also send the tasks to be performed by each data node. The communication between namenode and datanodes happen on RPC.

DataNode act as slave in cluster and only stores the file blocks. It has no knowledge of block to file mapping. It only stores the blocks and acts based on command it receives from namenode. Some of the command it receives are replicating or deleting the under/over-replicated blocks. It also has to send the heartbeat at regular interval to namenode to be able to keep participating in the cluster. It also sends the block report to namenode periodically. Data Nodes talk to each other directly to move the data blocks.

Secondary Namenode is  a sort of misnomer, as from name it might be concluded that it's hot standby, however it's just a edit logs collection node. i.e. it just keep on getting the changes done by namenode in namespace and keep those collecting to reduce the overhead from namenode. 

I will talk about write and read anatomy in my next post.

Monday, December 5, 2011

HDFS - An Introduction

HDFS (Hadoop Distributed File System) is Apache foundation project and is sub-project to Apache Hadoop project and is well-suited for storage of large data, in tune of Petabytes. It runs on commodity hardware and scales-out simply by adding more nodes/machines.

Some of the key feature/advantages of HDFS are:
1. In built Fault-tolerance with automatic detection.
2. Portability and scalability across heterogeneous hardware and operating systems.
3. Economic efficiency by using commodity hardware.
4. Practically "infinite" scalability giving large amount of storage space.
5. Extreme reliability by keeping replicas.

So WHAT IS IT GOOD FOR... "Absolutely Everything" that has anything to do with processing and storage of  large volumes and variety of data.

Here are some of the problems that HDFS can solve for you.
1. Manage and store large amounts of data and still have ready access to data. Unlike when you put data in tapes and access to data is a big challenge.
2.  Process this large amount of data with Mapreduce framework and generate insights into business that you always wanted, but were not able to do as the available tools were not  capable of handling that volume of data.
3. Not worry about storage anymore as you can simply add another commodity hardware to the cluster and get more storage.
4. Store any kind of data without having to worry about a pre-defined schema and wait for your DBA to allocate you table space and create schema for you.
5. Run searches and queries against semi/un-structure data.

In my next post I will deep dive discuss architecture and other details of HDFS.