Archivo de la categoría: Uncategorized

S3 and Parallel Processing – DirectFileOutputCommitter


The problem: While a Hadoop Job is writing output, it will write to a temporary directory: Task1 –> /unique/temp/directory/task1/file.tmp Task2 –> /unique/temp/directory/task2/file.tmp When the tasks finish the execution, will move (commit) the temporary file to a final location. This schema … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

God does not cast dice / Dios no juega a los dados


Niels Bohr (left) and Albert Einstein (right) discussing quantum mechanics.

Publicado en Uncategorized | Etiquetado | Deja un comentario

Copy Data with Hive and Spark / Copiar Datos con Hive y Spark


These are two examples of how to copy data from one S3 location to other S3 location. Same operation can be done from S3 to HDFS and vice-versa. I’m considering that you are able to launch the Hive client or … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Buñuelos Valencianos (de calabaza)


Ingredientes: 1 calabaza mediana (aprox. 800g) 500 gr harina 100 g levadura fresca 1/2 vaso de gaseosa (soda) Agua Aceite para freir (Girasol/Maiz/Oliva) Pasos: Pelar, sacar las semillas y hervir la calabaza para obtener un puré fino. Se reserva la … Seguir leyendo

Publicado en Cooking, Uncategorized | Deja un comentario

HBase and Zookeeper debugging


I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following: 2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Checking Yarn child execution environment


Never go out without this: $ sudo -u yarn jps 27343 YarnChild 4156 NodeManager 27292 Jps $ sudo strings -f /proc/27343/environ /proc/27343/environ: STDERR_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stderr /proc/27343/environ: SHELL=/bin/bash /proc/27343/environ: TERM=linux /proc/27343/environ: HADOOP_HOME=/usr/lib/hadoop /proc/27343/environ: YARN_PID_DIR=/var/run/hadoop-yarn /proc/27343/environ: NM_HOST=ip-172-31-5-156.us-west-2.compute.internal /proc/27343/environ: HADOOP_PREFIX=/usr/lib/hadoop /proc/27343/environ: YARN_OPTS= -XX:OnOutOfMemoryError=’kill -9 %p’ … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Create multiple files at once with ‘touch’


Sometimes we might need to create thousands or millions of files at once. This command will create the number specified in the range using touch: touch bspl{00001..70000}.c

Publicado en Uncategorized | Etiquetado , | Deja un comentario