Archivo de la etiqueta: Amazon

Building and Deploying Apache Bigtop Applications


If you want to explore how to build an application for Apache Bigtop and then deploy it using EMR, have a look at this blog post I wrote for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxNJ6YS4X6S59U/Building-and-Deploying-Custom-Applications-with-Apache-Bigtop-and-Amazon-EMR Anuncios

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , | Deja un comentario

get the size of an Amazon S3 bucket folder / obtener el tamaño de una carpeta en S3


aws s3 ls s3://my-bucket/folder –recursive | awk ‘BEGIN {total=0}{total+=$3}END{print total/1024/1024″ MB”}’

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Indexing Common Crawl Metadata on Elasticsearch using Cascading


If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster


If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | 3 comentarios

Instalando Maven en instancia Amazon EC2


Maven es una herramienta de software para la gestión y construcción de proyectos Java Obtenemos maven: $ wget http://apache.saix.net/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz Descomprimimos: $ tar -xzvf apache-maven-3.2.3-bin.tar.gz Movemos la carpeta a un directorio de instalación permanente: $ sudo mv /home/ec2-user/apache-maven-3.2.3 /usr/local/maven Creamos link … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

yarn: change configuration and restart node manager on a live cluster


This procedure is to change Yarn configuration on a live cluster, propagate the changes to all the nodes and restart Yarn node manager. Both commands are listing all the nodes on the cluster and then filtering the DNS name to … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | 1 Comentario

HDFS: Cluster to cluster copy with distcp


Este es el formato del comando distcp para copiar de hdfs a hdfs considerando cluster origen y destino en Amazon AWS: hadoop distcp “hdfs://ec2-54-86-202-252.compute-1.amazonaws.comec2-2:9000/tmp/test.txt” “hdfs://ec2-54-86-229-249.compute-1.amazonaws.comec2-2:9000/tmp/test1.txt” Mas informacion sobre distcp: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_7_2.html http://hadoop.apache.org/docs/r1.2.1/distcp2.html  

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario