Hadoop 1 vs Hadoop 2 – How many slots do I have per node ?


This is a topic that always rise a discussion…

In Hadoop 1, the number of tasks launched per node was specified via the settings mapred.map.tasks.maximum and mapred.reduce.tasks.maximum.

But this is ignored when set on Hadoop 2.

In Hadoop 2 with YARN, we can determine how many concurrent tasks are launched per node by dividing the resources allocated to YARN by the resources allocated to each MapReduce task, and taking the minimum of the two types of resources (memory and CPU).

This approach is an improvement over that of Hadoop 1, because the administrator no longer has to bundle CPU and memory into a Hadoop-specific concept of a “slot”.

The number of tasks that will be spawned per node:

min(
    yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb
    ,
    yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores
    )

Obtained value will be set on the variable ‘mapreduce.job.maps‘ on the ‘mapred-site.xml‘ file.

Of course, YARN is more dynamic than that, and each job can have unique resource requirements — so in a multitenant cluster with different types of jobs running, the calculation isn’t as straightforward.

More information:
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

Anuncios

Acerca de hvivani

sysadmin, developer, RHCSA
Esta entrada fue publicada en Uncategorized y etiquetada , , . Guarda el enlace permanente.

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s