Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Cloudera Employee
Following bullets provide, vantage point from which Applications on Hive has to be analyzed for performance
  • Application types

Ingestion intensive

Staging/Storage intensive

ETL intensive

Consumption intensive

  • Data Model used

Level of Normalization

Star schema

  • Table design

Storage format used

compression

Usage of collection data types (struct,array,map)

Partition

Is there a possibility of over partition

Whether dynamic partition enabled

    Bucket (Review join conditions on bucketed column )

    Functions

    Usage of UDF,UDAF

    • Query pattern

    Select with where ( map only)

    Group by (map shuffle reduce)

    Order by (map shuffle single reduce )

    Analytical functions

    Sort by (map shuffle multi reduce )

    Join

    Map join ( these are mapper only but would seek heavier memory )

    Sort merge join

    Partition column usage -( Especially For huge transaction tables )

    From source table - Usage of multi pass

    Table size : For huge tables analyze everything from scan perspective

663 Views
0 Kudos