Archivo de la etiqueta: BigData

HBase and Zookeeper debugging


I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following: 2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Checking Yarn child execution environment


Never go out without this: $ sudo -u yarn jps 27343 YarnChild 4156 NodeManager 27292 Jps $ sudo strings -f /proc/27343/environ /proc/27343/environ: STDERR_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stderr /proc/27343/environ: SHELL=/bin/bash /proc/27343/environ: TERM=linux /proc/27343/environ: HADOOP_HOME=/usr/lib/hadoop /proc/27343/environ: YARN_PID_DIR=/var/run/hadoop-yarn /proc/27343/environ: NM_HOST=ip-172-31-5-156.us-west-2.compute.internal /proc/27343/environ: HADOOP_PREFIX=/usr/lib/hadoop /proc/27343/environ: YARN_OPTS= -XX:OnOutOfMemoryError=’kill -9 %p’ … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Create multiple files at once with ‘touch’


Sometimes we might need to create thousands or millions of files at once. This command will create the number specified in the range using touch: touch bspl{00001..70000}.c

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Creating Bigtop patches


To contribute to Bigtop project, we need to submit a patch. We should follow this process for managing our proposed contributions: Create a Jira ticket with the description of the problem. (Note: the ticket should be Minor priority for most … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

FileInputFormat vs. CombineFileInputFormat


When you put a file into HDFS, it is converted to blocks of 128 MB. (Default value for HDFS on EMR) Consider a file big enough to consume 10 blocks. When you read that file from HDFS as an input … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Start Hive in Debug Mode


Never go out without it: hive –hiveconf hive.root.logger=DEBUG,console

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Consider boosting spark.yarn.executor.memoryOverhead


This is a very specific error related to the Spark Executor and the YARN container coexistence. You will typically see errors like this one on the application container logs: 15/03/12 18:53:46 WARN YarnAllocator: Container killed by YARN for exceeding memory … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario