With the increasing popularity and the addiction of companies towards Hadoop, also Hadoop being an unanimous solution for Big data platforms makes the Hadoop development team to focus on the current architectural deficiencies and make Hadoop free from such underlying architectural issues. In that path a new Hadoop MapReduce version has taken birth MapReduce 2.0 (MRv2) or YARN. MapReduce has undergone a complete overhaul in hadoop-0.23 and now we have MapReduce 2.0 (MRv2) or YARN.
Let me take this opportunity to give the brief introduction to YARN, The basic change in MRv2 is the split-up of two major functionalities of JobTracker into separate daemons. They are,
- Resource Management
- Scheduling/Monitering
In order to achieve this new components have been introduced, namely,
- ResourceManager (RM)
- ApplicationsManager
- NodeManager (NM)
- ApplicationMaster (AM)
- Container
I) ResourceManager (RM)
The ResourceManager (RM) is the key service offered in YARN. Clients can interact with the framework using ResourceManager. ResourceManager is the master for all other daemons available in the framework.
ResourceManager has two major components,
- Scheduler
- ApplicationsManager
a) Scheduler
1. The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.
2. It is a pure scheduler since it does not monitor or track the status of the application instead it purely performs its scheduling function based the resource requirements of the applications
3. It schedules the resources depending the resource “Container�?
4. The Scheduler has the pluggable policy plug-in which is responsible for partitioning the cluster resources among the various queues, applications etc, for example
a) CapacityScheduler
b) FairScheduler
b) ApplicationsManager
1. ApplicationsManager is responsible for accepting job-submissions
2. Assigning the first container for executing the application specific ApplicationMaster
3. Provides the service for restarting the ApplicationMaster container on failure
II) NodeManager (NM)
NodeManager is similar to TaskTracker
1. NodeManager is responsible for Containers
2. Monitoring container resource usage (like cpu, memory, disk, network)
3. Reporting to the ResourceManager/Scheduler
III) ApplicationMaster (AM)
1. ApplicationMaster is responsible for negotiating appropriate resource containers from the Scheduler
2. Tracking the status and monitoring progress for applications running under this ApplicationMaster
IV) Container
Resource Container incorporates elements such as memory, cpu, disk, network etc. Only memory is supported in first version
I think, in the current version of hadoop 0.23, only capacity scheduler is supported so far. No support for Fair scheduler yet.