Archivo de la etiqueta: Hadoop

Adding a mount point to HDFS


Before proceeding: This procedure considers that you don’t have any current useful data on HDFS. All the data will be lost after adding mount points with this method. This procedure should be applied to every datanode in the cluster. No … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

AWS EMR – Big Data in Strata New York


Will you be in New York next week (Sept 25th – Sept 28th)?                    Come meet the AWS Big Data team at Strata Data Conference, where we’ll be happy to answer your questions, hear about your requirements, and help you … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

s3:// vs s3n:// vs s3a:// vs EMRFS


s3:// Apache Hadoop implementation of a block-based filesystem backed by S3. Apache Hadoop has deprecated use of this filesystem as of May 2016. s3n:// A native filesystem for reading and writing regular files on S3. S3N allows Hadoop to access … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

Kill’em All!


Use it at your own discretion: for app in `yarn application -list | awk ‘$6 == “ACCEPTED” { print $1 }’` ; do yarn application -kill “$app”;done            

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

S3 and Parallel Processing – DirectFileOutputCommitter


The problem: While a Hadoop Job is writing output, it will write to a temporary directory: Task1 –> /unique/temp/directory/task1/file.tmp Task2 –> /unique/temp/directory/task2/file.tmp When the tasks finish the execution, will move (commit) the temporary file to a final location. This schema … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Running Spark with oozie


Oozie 4.2 now supports spark-action. Example job.properties file (configuration tested on EMR 4.2.0): nameNode=hdfs://172.31.25.17:8020 jobTracker=172.31.25.17:8032 master=local[*] queueName=default examplesRoot=examples oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark (Use the master node internal IP instead of localhost in the nameNode and jobTracker) Validate oozie workflow xml file: oozie … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Hadoop: Output Commiter Notes


OutputCommitter describes the commit of task output for a MapReduce job. The MapReduce framework relies on the OutputCommitter of the job to: Set up the job during initialization; for example, create the temporary output directory for the job. Job setup … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario