Archivo de la etiqueta: Amazon

task blocked for more than 120 seconds


When we run very stressful jobs on clustered environments, where IO activity is very high. It is pretty common to start seeing these messages on the ‘dmesg’ kernel output: [24169.372862] INFO: task kswapd1:1140 blocked for more than 120 seconds. [24169.375623] … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

failed to alloc buffer for rx queue


If we put enough pressure over the ENA network driver, we’ll start seeing these “failed to alloc buffer for rx queue” messages on the ‘dmesg‘ output. [56459.833033] ena 0000:00:05.0 eth0: failed to alloc buffer for rx queue 4 [56459.836477] ena … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

AWS EMR – Big Data in Strata New York


Will you be in New York next week (Sept 25th – Sept 28th)?                    Come meet the AWS Big Data team at Strata Data Conference, where we’ll be happy to answer your questions, hear about your requirements, and help you … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

s3:// vs s3n:// vs s3a:// vs EMRFS


s3:// Apache Hadoop implementation of a block-based filesystem backed by S3. Apache Hadoop has deprecated use of this filesystem as of May 2016. s3n:// A native filesystem for reading and writing regular files on S3. S3N allows Hadoop to access … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , , | Deja un comentario

Building and Deploying Apache Bigtop Applications


If you want to explore how to build an application for Apache Bigtop and then deploy it using EMR, have a look at this blog post I wrote for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxNJ6YS4X6S59U/Building-and-Deploying-Custom-Applications-with-Apache-Bigtop-and-Amazon-EMR

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , | Deja un comentario

get the size of an Amazon S3 bucket folder / obtener el tamaño de una carpeta en S3


aws s3 ls s3://my-bucket/folder –recursive | awk ‘BEGIN {total=0}{total+=$3}END{print total/1024/1024″ MB”}’

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Indexing Common Crawl Metadata on Elasticsearch using Cascading


If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | Deja un comentario