Archivo del Autor: hvivani

Acerca de hvivani

sysadmin, developer, RHCSA

How Ganglia works

What is Ganglia ? Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Back to the basics: Creating a SPEC file from a Maven project

1) Build the package with the provided pom.xml: $ mvn package 2) Rebuild the RPM structure: $ mvn -DskipTests=true rpm:rpm A structure like the following will be created: /target/rpm/<app_name>/BUILD /target/rpm/<app_name>/RPMS /target/rpm/<app_name>/SOURCES /target/rpm/<app_name>/SPECS /target/rpm/<app_name>/SRPMS

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

HBase useful commands

1) Connect to HBase. Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. $ ./bin/hbase shell hbase(main):001:0> 2) Create a table. Use the create command to create a … Sigue leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Hive: Extracting JSON fields

Handling JSON files with Hive is not always an easy task. If you need to extract some specific fields from a structured JSON, we have some alternatives: There are two UDF functions that are usually helpful on this cases: ‘get_json_object’ … Sigue leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Elasticsearch and Kibana on EMR Hadoop cluster

If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS: http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR It contains all the steps to launch a cluster and perform the basic testings on both … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , , | Deja un comentario

NoSQL: Amazon’s DynamoDB and Apache HBase Performance and Modeling notes

The challenge that architects and developers face today is how to process large volumes of data in a timely, cost effective, and reliable manner. There are several NoSQL solutions in the market today, and choosing the right one for your … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

YARN / Map Reduce memory settings

On Hadoop 1, we used to use mapred.child.java.opts to set the Java Heap size for the task tracker child processes. With YARN, that parameter has been deprecated in favor of: mapreduce.map.java.opts – These parameter is passed to the JVM for mappers. … Sigue leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario