Archivo de la etiqueta: fedora

Indexing Common Crawl Metadata on Elasticsearch using Cascading

If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Back to the basics: Creating a SPEC file from a Maven project

1) Build the package with the provided pom.xml: $ mvn package 2) Rebuild the RPM structure: $ mvn -DskipTests=true rpm:rpm A structure like the following will be created: /target/rpm/<app_name>/BUILD /target/rpm/<app_name>/RPMS /target/rpm/<app_name>/SOURCES /target/rpm/<app_name>/SPECS /target/rpm/<app_name>/SRPMS

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster

If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , , , | Deja un comentario

Create a really big file / Crear un archivo realmente grande

This is sometimes useful when playing with bigdata. Instead of a dd command and wait the file being created block by clock, we can run: $ fallocate -l 200G /mnt/reallyBigFile.csv It essentially “allocates” all of the space you’re seeking, but … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | 2 comentarios

Instalando Maven en instancia Amazon EC2

Maven es una herramienta de software para la gestión y construcción de proyectos Java Obtenemos maven: $ wget http://apache.saix.net/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz Descomprimimos: $ tar -xzvf apache-maven-3.2.3-bin.tar.gz Movemos la carpeta a un directorio de instalación permanente: $ sudo mv /home/ec2-user/apache-maven-3.2.3 /usr/local/maven Creamos link … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Hive: dealing with Out of Memory and Garbage Collector errors.

This is the common error: java.lang.OutOfMemoryError: GC overhead limit exceeded This error will occur in several Java environments, but, in particular, with Hive, is pretty common when big structures or several thousands objects are stored in memory. According to Sun, … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Mandus Momberg’s Blog ! – the beauty of BASH

I would like to share with you a new awesome blog from an awesome professional: http://blog.mandusmomberg.com/ And… as a first post, a nice one, about the beauty of BASH: http://blog.mandusmomberg.com/blog/2014/12/01/o-what-a-beautiful-bashing/ Enjoy !

Publicado en Uncategorized | Etiquetado , | Deja un comentario