- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 04-11-2018 09:59 PM
- Application types
Ingestion intensive
Staging/Storage intensive
ETL intensive
Consumption intensive
- Data Model used
Level of Normalization
Star schema
- Table design
Storage format used
compression
Usage of collection data types (struct,array,map)
Partition
Is there a possibility of over partition
Whether dynamic partition enabled
- Query pattern
Bucket (Review join conditions on bucketed column )
Functions
Usage of UDF,UDAF
Select with where ( map only)
Group by (map shuffle reduce)
Order by (map shuffle single reduce )
Analytical functions
Sort by (map shuffle multi reduce )
Join
Map join ( these are mapper only but would seek heavier memory )
Sort merge join
Partition column usage -( Especially For huge transaction tables )
From source table - Usage of multi pass
Table size : For huge tables analyze everything from scan perspective