Created on 07-07-2016 06:04 PM
I have worked with almost 20 to 25 applications. Whenever i start working first i have to understand each applications naming convention and i keep thinking why we all not follow single naming convention. As Hadoop is evolving rapidly therefore would like to share my naming convention so that may be if you come to my project will feel comfortable and so as I if you follow too.
Database Names:
If application serve to technology then database name would be like
<APPID>_<TECHNOLOGY>_TBLS
<APPID>_<TECHNOLOGY>_VIEW
If application serve to vendor then database name would be like
<APPID>_<VENDORNAME>_TBLS
<APPID>_<VENDORNAME>_VIEW
If database application further required to divide by module then database name would be like
<APPLID>_<MODULE>_TBLS
<APPLID>_<MODULE>_VIEW
Fact Table Names:
TFXXX_<FREQUENCY>_<AGRT>
Note: AGGRT is will not be there for the table stores lowest granularity table. It will be added only to aggregate data table.
XXX: Range from 001 to 999 (We can set number according to our requirement)
FREQUENCY:
External Table Names:
TEXXX_<FREQUENCY>
Dim Table Names:
TDXXX_<DIM_TYPE_NAME>
XXX: Range from 001 to 999
Lookup\Config tables
TLXXX_<REF>
XXX: Range from 001 to 999
Control tables:
TCXXX_<TABLENAME>
XXX: Range from 001 to 999
Temporary Tables:
TMP_<JOBNAME>_<Name>
Note: (This should be used for the tables which is created and dropped by job while it’s executing)
PRM_<JOBNAME>_<Name>
Note: (This should be used for the tables which are used to insert and drop data while it’s executing)
View Names:
VFXXX_<FREQUENCY>_<AGRT>
Note: AGGRT is will not be there for the table stores lowest granularity table. It will be added only to aggregate data table.
XXX: Range from 001 to 999
FREQUENCY:
Column Names:
Stored Procs or HQL Query:
PSXXX_[<FREQUENCY>|<CALC>|<AGRT>|<DownStream>]
Example: PS001_ENGINEERING_HOURLY
XXX: Range from 001 to 999
Macro:
MCXXX_<MODULENAME>
XXX: Range from 001 to 999
UDF(Hadoop):
UDFXXX_<MODULENAME>
XXX: Range from 001 to 999
Index:
Index Names TFXXX _ PRI _ IDX#_<NUSI/USI>
IDX = constant for primary index
# = secondary index sequential numeric number(1, 2, 3, 4, ...)
PRI – primary index (used to distribute data across amps and then for access performance
NUSI- non unique secondary index used for access performance
USI - unique secondary index used for access performance
Next Article i'll share more naming convention on Oozie, file naming and Data types...
Created on 07-07-2016 06:04 PM
These convention are for all those business application who are now ready/planning to migrate Hadoop. So you dont need to invent convention wheel again, we already did a lot brain storming on this.
Created on 07-11-2016 04:39 PM
For Composite Key:
<LEVELNAME>_<ENTITYNAME>_Key
Note: For multiple key needs put multiple keys in Fact tables.
OOzie Job Naming:
<VENDOR>_<ENTITY>_<LEVELNAME>_<FREQUENCY>_[<CALC>|<AGRT>|<DownStream>].xml
File extension for Hadoop:
HQL files extension ".hql"
Java files extension ".java"
Property file extentsion ".properties"
Shell script extension ".sh"
Oozie config files ".xml"
Data definition files ".ddl"