Archivo de la etiqueta: Hive

Copy Data with Hive and Spark / Copiar Datos con Hive y Spark


These are two examples of how to copy data from one S3 location to other S3 location. Same operation can be done from S3 to HDFS and vice-versa. I’m considering that you are able to launch the Hive client or … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Start Hive in Debug Mode


Never go out without it: hive –hiveconf hive.root.logger=DEBUG,console

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Hive: Extracting JSON fields


Handling JSON files with Hive is not always an easy task. If you need to extract some specific fields from a structured JSON, we have some alternatives: There are two UDF functions that are usually helpful on this cases: ‘get_json_object’ … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Hive: dealing with Out of Memory and Garbage Collector errors.


This is the common error: java.lang.OutOfMemoryError: GC overhead limit exceeded This error will occur in several Java environments, but, in particular, with Hive, is pretty common when big structures or several thousands objects are stored in memory. According to Sun, … Seguir leyendo

Publicado en Uncategorized | Etiquetado , , , | Deja un comentario

Hive logs to stdout


Muchas veces necesitamos debugear alguna consulta Hive que esta dando error. Una manera facil es habilitar el logger por consola: hive.root.logger specifies the logging level as well as the log destination. Specifying console as the target sends the logs to … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

Hive query with JOIN, GROUP BY and SUM does not return results


On Hive 0.11, and lower versions, if we set: set hive.optimize.skewjoin=true; set hive.auto.convert.join=false; A query with JOIN, GROUP BY and SUM does not return results. But if we make the query a little more simple, using JOIN but not GROUP … Seguir leyendo

Publicado en Uncategorized | Etiquetado , | Deja un comentario

check system variables or environment variables on Hive


On Hive we can check values for system variables or environment variables with the command: hive> set; if we need to ask for a specific variable value, we can run: hive> set hive.security.authorization.enabled; hive.security.authorization.enabled=false More information: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

Publicado en Uncategorized | Etiquetado , | Deja un comentario