Archivo de la etiqueta: fedora

Indexing Common Crawl Metadata on Elasticsearch using Cascading


If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | Deja un comentario

Back to the basics: Creating a SPEC file from a Maven project


1) Build the package with the provided pom.xml: $ mvn package 2) Rebuild the RPM structure: $ mvn -DskipTests=true rpm:rpm A structure like the following will be created: /target/rpm/<app_name>/BUILD /target/rpm/<app_name>/RPMS /target/rpm/<app_name>/SOURCES /target/rpm/<app_name>/SPECS /target/rpm/<app_name>/SRPMS

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster


If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Seguir leyendo

Publicado en Mis Publicaciones, Uncategorized | Etiquetado , , , , , | 3 comentarios

Create a really big file / Crear un archivo realmente grande


This is sometimes useful when playing with bigdata. Instead of a dd command and wait the file being created block by clock, we can run: $ fallocate -l 200G /mnt/reallyBigFile.csv It essentially “allocates” all of the space you’re seeking, but … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | 2 comentarios

Instalando Maven en instancia Amazon EC2


Maven es una herramienta de software para la gestión y construcción de proyectos Java Obtenemos maven: $ wget http://apache.saix.net/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz Descomprimimos: $ tar -xzvf apache-maven-3.2.3-bin.tar.gz Movemos la carpeta a un directorio de instalación permanente: $ sudo mv /home/ec2-user/apache-maven-3.2.3 /usr/local/maven Creamos link … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Hive: dealing with Out of Memory and Garbage Collector errors.


This is the common error: java.lang.OutOfMemoryError: GC overhead limit exceeded This error will occur in several Java environments, but, in particular, with Hive, is pretty common when big structures or several thousands objects are stored in memory. According to Sun, … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Mandus Momberg’s Blog ! – the beauty of BASH


I would like to share with you a new awesome blog from an awesome professional: http://blog.mandusmomberg.com/ And… as a first post, a nice one, about the beauty of BASH: http://blog.mandusmomberg.com/blog/2014/12/01/o-what-a-beautiful-bashing/ Enjoy !

Publicado en Uncategorized | Etiquetado , | Deja un comentario

vi sudo save with root permissions / grabar cambios con permisos de root


Just: :w !sudo tee % % is current file. !sudo tee calls tee with administrator privileges and writes to current file.  But not vi buffered file. That’s why you will see a warning like this when using the command: W12: … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

MapReduce: Compression and Input Splits


This is something that always rise doubts: When considering compressed data that will be processed by MapReduce, it is important to check if the compression format supports splitting. If not, the number of map tasks may not be the expected. … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

yarn: change configuration and restart node manager on a live cluster


This procedure is to change Yarn configuration on a live cluster, propagate the changes to all the nodes and restart Yarn node manager. Both commands are listing all the nodes on the cluster and then filtering the DNS name to … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | 1 Comentario