Archivo del Autor: hvivani

Acerca de hvivani

sysadmin, developer, RHCSA

s3:// vs s3n:// vs s3a:// vs EMRFS


s3:// Apache Hadoop implementation of a block-based filesystem backed by S3. Apache Hadoop has deprecated use of this filesystem as of May 2016. s3n:// A native filesystem for reading and writing regular files on S3. S3N allows Hadoop to access … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

S3 and Parallel Processing – DirectFileOutputCommitter


The problem: While a Hadoop Job is writing output, it will write to a temporary directory: Task1 –> /unique/temp/directory/task1/file.tmp Task2 –> /unique/temp/directory/task2/file.tmp When the tasks finish the execution, will move (commit) the temporary file to a final location. This schema … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

God does not cast dice / Dios no juega a los dados


Niels Bohr (left) and Albert Einstein (right) discussing quantum mechanics.

Publicado en Uncategorized | Etiquetado | Deja un comentario

Copy Data with Hive and Spark / Copiar Datos con Hive y Spark


These are two examples of how to copy data from one S3 location to other S3 location. Same operation can be done from S3 to HDFS and vice-versa. I’m considering that you are able to launch the Hive client or … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Buñuelos Valencianos (de calabaza)


Ingredientes: 1 calabaza mediana (aprox. 800g) 500 gr harina 100 g levadura fresca 1/2 vaso de gaseosa (soda) Agua Aceite para freir (Girasol/Maiz/Oliva) Pasos: Pelar, sacar las semillas y hervir la calabaza para obtener un puré fino. Se reserva la … Seguir leyendo

Publicado en Cooking, Uncategorized | Deja un comentario

HBase and Zookeeper debugging


I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following: 2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Checking Yarn child execution environment


Never go out without this: $ sudo -u yarn jps 27343 YarnChild 4156 NodeManager 27292 Jps $ sudo strings -f /proc/27343/environ /proc/27343/environ: STDERR_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stderr /proc/27343/environ: SHELL=/bin/bash /proc/27343/environ: TERM=linux /proc/27343/environ: HADOOP_HOME=/usr/lib/hadoop /proc/27343/environ: YARN_PID_DIR=/var/run/hadoop-yarn /proc/27343/environ: NM_HOST=ip-172-31-5-156.us-west-2.compute.internal /proc/27343/environ: HADOOP_PREFIX=/usr/lib/hadoop /proc/27343/environ: YARN_OPTS= -XX:OnOutOfMemoryError=’kill -9 %p’ … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario