Archivo de la etiqueta: BigData

s3:// vs s3n:// vs s3a:// vs EMRFS


s3:// Apache Hadoop implementation of a block-based filesystem backed by S3. Apache Hadoop has deprecated use of this filesystem as of May 2016. s3n:// A native filesystem for reading and writing regular files on S3. S3N allows Hadoop to access … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

Copy Data with Hive and Spark / Copiar Datos con Hive y Spark


These are two examples of how to copy data from one S3 location to other S3 location. Same operation can be done from S3 to HDFS and vice-versa. I’m considering that you are able to launch the Hive client or … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

HBase and Zookeeper debugging


I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following: 2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Checking Yarn child execution environment


Never go out without this: $ sudo -u yarn jps 27343 YarnChild 4156 NodeManager 27292 Jps $ sudo strings -f /proc/27343/environ /proc/27343/environ: STDERR_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stderr /proc/27343/environ: SHELL=/bin/bash /proc/27343/environ: TERM=linux /proc/27343/environ: HADOOP_HOME=/usr/lib/hadoop /proc/27343/environ: YARN_PID_DIR=/var/run/hadoop-yarn /proc/27343/environ: NM_HOST=ip-172-31-5-156.us-west-2.compute.internal /proc/27343/environ: HADOOP_PREFIX=/usr/lib/hadoop /proc/27343/environ: YARN_OPTS= -XX:OnOutOfMemoryError=’kill -9 %p’ … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Create multiple files at once with ‘touch’


Sometimes we might need to create thousands or millions of files at once. This command will create the number specified in the range using touch: touch bspl{00001..70000}.c

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Creating Bigtop patches


To contribute to Bigtop project, we need to submit a patch. We should follow this process for managing our proposed contributions: Create a Jira ticket with the description of the problem. (Note: the ticket should be Minor priority for most … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

FileInputFormat vs. CombineFileInputFormat


When you put a file into HDFS, it is converted to blocks of 128 MB. (Default value for HDFS on EMR) Consider a file big enough to consume 10 blocks. When you read that file from HDFS as an input … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario