Community Articles

knatarasan · ‎04-11-2018

Following bullets provide, vantage point from which Applications on Hive has to be analyzed for performance

Ingestion intensive

Staging/Storage intensive

ETL intensive

Consumption intensive

Level of Normalization

Star schema

Storage format used

compression

Usage of collection data types (struct,array,map)

Partition

Is there a possibility of over partition

Whether dynamic partition enabled

Bucket (Review join conditions on bucketed column )

Functions

Usage of UDF,UDAF

Select with where ( map only)

Group by (map shuffle reduce)

Order by (map shuffle single reduce )

Analytical functions

Sort by (map shuffle multi reduce )

Join

Map join ( these are mapper only but would seek heavier memory )

Sort merge join

Partition column usage -( Especially For huge transaction tables )

From source table - Usage of multi pass

Table size : For huge tables analyze everything from scan perspective

Elements of Hive Application Tuning