Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Cloudera Employee
Following bullets provide, vantage point from which Applications on Hive has to be analyzed for performance
  • Application types

Ingestion intensive

Staging/Storage intensive

ETL intensive

Consumption intensive

  • Data Model used

Level of Normalization

Star schema

  • Table design

Storage format used

compression

Usage of collection data types (struct,array,map)

Partition

Is there a possibility of over partition

Whether dynamic partition enabled

    Bucket (Review join conditions on bucketed column )

    Functions

    Usage of UDF,UDAF

    • Query pattern

    Select with where ( map only)

    Group by (map shuffle reduce)

    Order by (map shuffle single reduce )

    Analytical functions

    Sort by (map shuffle multi reduce )

    Join

    Map join ( these are mapper only but would seek heavier memory )

    Sort merge join

    Partition column usage -( Especially For huge transaction tables )

    From source table - Usage of multi pass

    Table size : For huge tables analyze everything from scan perspective

601 Views
0 Kudos