Hive query with JOIN, GROUP BY and SUM does not return results


On Hive 0.11, and lower versions, if we set:

set hive.optimize.skewjoin=true; 
set hive.auto.convert.join=false;

A query with JOIN, GROUP BY and SUM does not return results.

But if we make the query a little more simple, using JOIN but not GROUP and SUM functions, We will GET RESULTS.

I have found that there are bugs reported recently:

https://issues.apache.org/jira/browse/HIVE-5888

and

https://issues.apache.org/jira/browse/HIVE-6041

This bug is related also with the previous one (https://issues.apache.org/jira/browse/HIVE-6520), already reported:

If we set:

set hive.optimize.skewjoin=true; 
set hive.auto.convert.join=true;

We will no have any output.

The reason is that the skew join in hive relies on a reduce phase to save skewed keys on local disk, but hive.auto.convert.join=true turns a mapreduce task into a mapjoin task in some scenarios.

As a result, there is no skewed keys generated by the mapjoin and the result is empty.

If you set hive.auto.convert.join=false to disable the auto conversion of a mapjoin, the performance is very bad because the reduce phase takes a very long time to process the skew keys.

 

This is expected to be resolved on version hive-0.13.0.

Anuncios

Acerca de hvivani

sysadmin, developer, RHCSA
Esta entrada fue publicada en Uncategorized y etiquetada , . Guarda el enlace permanente.

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s