Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement
Labels (1)
avatar
Cloudera Employee
Following bullets provide, vantage point from which Applications on Hive has to be analyzed for performance
  • Application types

Ingestion intensive

Staging/Storage intensive

ETL intensive

Consumption intensive

  • Data Model used

Level of Normalization

Star schema

  • Table design

Storage format used

compression

Usage of collection data types (struct,array,map)

Partition

Is there a possibility of over partition

Whether dynamic partition enabled

    Bucket (Review join conditions on bucketed column )

    Functions

    Usage of UDF,UDAF

    • Query pattern

    Select with where ( map only)

    Group by (map shuffle reduce)

    Order by (map shuffle single reduce )

    Analytical functions

    Sort by (map shuffle multi reduce )

    Join

    Map join ( these are mapper only but would seek heavier memory )

    Sort merge join

    Partition column usage -( Especially For huge transaction tables )

    From source table - Usage of multi pass

    Table size : For huge tables analyze everything from scan perspective

812 Views
0 Kudos
Version history
Last update:
‎04-11-2018 09:59 PM
Updated by:
Contributors