Archivo de la etiqueta: Amazon

AWS EMR – Big Data in Strata New York


Will you be in New York next week (Sept 25th – Sept 28th)?                    Come meet the AWS Big Data team at Strata Data Conference, where we’ll be happy to answer your questions, hear about your requirements, and help you … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Building and Deploying Apache Bigtop Applications


If you want to explore how to build an application for Apache Bigtop and then deploy it using EMR, have a look at this blog post I wrote for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxNJ6YS4X6S59U/Building-and-Deploying-Custom-Applications-with-Apache-Bigtop-and-Amazon-EMR

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , | Deja un comentario

get the size of an Amazon S3 bucket folder / obtener el tamaño de una carpeta en S3


aws s3 ls s3://my-bucket/folder –recursive | awk ‘BEGIN {total=0}{total+=$3}END{print total/1024/1024″ MB”}’

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Indexing Common Crawl Metadata on Elasticsearch using Cascading


If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster


If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | 3 comentarios

Instalando Maven en instancia Amazon EC2


Maven es una herramienta de software para la gestión y construcción de proyectos Java Obtenemos maven: $ wget http://apache.saix.net/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz Descomprimimos: $ tar -xzvf apache-maven-3.2.3-bin.tar.gz Movemos la carpeta a un directorio de instalación permanente: $ sudo mv /home/ec2-user/apache-maven-3.2.3 /usr/local/maven Creamos link … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

yarn: change configuration and restart node manager on a live cluster


This procedure is to change Yarn configuration on a live cluster, propagate the changes to all the nodes and restart Yarn node manager. Both commands are listing all the nodes on the cluster and then filtering the DNS name to … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | 1 Comentario