Archivo de la categoría: Uncategorized

HBase useful commands

1) Connect to HBase. Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. $ ./bin/hbase shell hbase(main):001:0> 2) Create a table. Use the create command to create a … Sigue leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Hive: Extracting JSON fields

Handling JSON files with Hive is not always an easy task. If you need to extract some specific fields from a structured JSON, we have some alternatives: There are two UDF functions that are usually helpful on this cases: ‘get_json_object’ … Sigue leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster

If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

NoSQL: Amazon’s DynamoDB and Apache HBase Performance and Modeling notes

The challenge that architects and developers face today is how to process large volumes of data in a timely, cost effective, and reliable manner. There are several NoSQL solutions in the market today, and choosing the right one for your … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

YARN / Map Reduce memory settings

On Hadoop 1, we used to use mapred.child.java.opts to set the Java Heap size for the task tracker child processes. With YARN, that parameter has been deprecated in favor of: mapreduce.map.java.opts – These parameter is passed to the JVM for mappers. … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Los números de 2014

Los duendes de las estadísticas de WordPress.com prepararon un informe sobre el año 2014 de este blog. Aquí hay un extracto: El Museo del Louvre tiene 8.5 millones de visitantes por año. Este blog fue visto cerca de 150.000 veces … Sigue leyendo

Publicado en Uncategorized | Deja un comentario

Create a really big file / Crear un archivo realmente grande

This is sometimes useful when playing with bigdata. Instead of a dd command and wait the file being created block by clock, we can run: $ fallocate -l 200G /mnt/reallyBigFile.csv It essentially “allocates” all of the space you’re seeking, but … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , | 2 comentarios