Archivo de la etiqueta: Java

Debugging Java Threads


Which Java process is using most of the CPU: $ ps u -C java Generate the Java thread dump: $ jstack -l PId > PId-threads.txt From the Java threads I can count: $ awk ‘/State: / { print }’ < … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Java change default version / cambiar la version Java por defecto


If we have more than one Java version installed on your Linux server (Redhat flavor) you can change defaults using ‘alternatives’ command: [hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives –config java There are 2 programs which provide ‘java’.   Selection    Command ———————————————– *+ … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

FileInputFormat vs. CombineFileInputFormat


When you put a file into HDFS, it is converted to blocks of 128 MB. (Default value for HDFS on EMR) Consider a file big enough to consume 10 blocks. When you read that file from HDFS as an input … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Consider boosting spark.yarn.executor.memoryOverhead


This is a very specific error related to the Spark Executor and the YARN container coexistence. You will typically see errors like this one on the application container logs: 15/03/12 18:53:46 WARN YarnAllocator: Container killed by YARN for exceeding memory … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

YARN / Map Reduce memory settings


On Hadoop 1, we used to use mapred.child.java.opts to set the Java Heap size for the task tracker child processes. With YARN, that parameter has been deprecated in favor of: mapreduce.map.java.opts – These parameter is passed to the JVM for mappers. … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Adding a JAR path to Hadoop classpath


This is simple, but it is a frequent question: If we need to add some specific path pointing to a thirdparty library we can run a command like the following: $ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hadoop/.versions/Cascading-2.5-SDK/binary/cascading/*:/home/hadoop/.versions/Cascading-2.5-SDK/binary/cascading/lib/cascading-core/* Here I am adding two directories to … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , | Deja un comentario

Hive: dealing with Out of Memory and Garbage Collector errors.


This is the common error: java.lang.OutOfMemoryError: GC overhead limit exceeded This error will occur in several Java environments, but, in particular, with Hive, is pretty common when big structures or several thousands objects are stored in memory. According to Sun, … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario