Archivo de la etiqueta: Linux

yarn: execute a script on all the nodes in the cluster

This is more Linux script related, but saved my life several times. for i in `yarn node –list | cut -f 1 -d ‘:’ | grep “ip”`; do ssh -i your-key.pem hadoop@$i ‘hadoop fs -copyToLocal s3://mybucket/myscript.sh | chmod +x /home/hadoop/myscript.sh … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

Indexing Common Crawl Metadata on Elasticsearch using Cascading

If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

How Ganglia works

What is Ganglia ? Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Back to the basics: Creating a SPEC file from a Maven project

1) Build the package with the provided pom.xml: $ mvn package 2) Rebuild the RPM structure: $ mvn -DskipTests=true rpm:rpm A structure like the following will be created: /target/rpm/<app_name>/BUILD /target/rpm/<app_name>/RPMS /target/rpm/<app_name>/SOURCES /target/rpm/<app_name>/SPECS /target/rpm/<app_name>/SRPMS

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster

If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Create a really big file / Crear un archivo realmente grande

This is sometimes useful when playing with bigdata. Instead of a dd command and wait the file being created block by clock, we can run: $ fallocate -l 200G /mnt/reallyBigFile.csv It essentially “allocates” all of the space you’re seeking, but … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , | 2 comentarios

Instalando Maven en instancia Amazon EC2

Maven es una herramienta de software para la gestión y construcción de proyectos Java Obtenemos maven: $ wget http://apache.saix.net/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz Descomprimimos: $ tar -xzvf apache-maven-3.2.3-bin.tar.gz Movemos la carpeta a un directorio de instalación permanente: $ sudo mv /home/ec2-user/apache-maven-3.2.3 /usr/local/maven Creamos link … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario