Archivo de la categoría: Uncategorized

yarn: execute a script on all the nodes in the cluster

This is more Linux script related, but saved my life several times. for i in `yarn node –list | cut -f 1 -d ‘:’ | grep “ip”`; do ssh -i your-key.pem hadoop@$i ‘hadoop fs -copyToLocal s3://mybucket/myscript.sh | chmod +x /home/hadoop/myscript.sh … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

HDFS: changing the replication factor

The replication factor is a property that can be set in the HDFS configuration file that will allow you to adjust the global replication factor for the entire cluster. For each block stored in HDFS, there will be n – … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

get the size of an Amazon S3 bucket folder / obtener el tamaño de una carpeta en S3

aws s3 ls s3://my-bucket/folder –recursive | awk ‘BEGIN {total=0}{total+=$3}END{print total/1024/1024″ MB”}’

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Indexing Common Crawl Metadata on Elasticsearch using Cascading

If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

How Ganglia works

What is Ganglia ? Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Back to the basics: Creating a SPEC file from a Maven project

1) Build the package with the provided pom.xml: $ mvn package 2) Rebuild the RPM structure: $ mvn -DskipTests=true rpm:rpm A structure like the following will be created: /target/rpm/<app_name>/BUILD /target/rpm/<app_name>/RPMS /target/rpm/<app_name>/SOURCES /target/rpm/<app_name>/SPECS /target/rpm/<app_name>/SRPMS

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

HBase useful commands

1) Connect to HBase. Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. $ ./bin/hbase shell hbase(main):001:0> 2) Create a table. Use the create command to create a … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario