Support Questions
Find answers, ask questions, and share your expertise

A model for Hadoop

New Contributor

A question.. maybe not very clever... but still a question. If HDFS is a file system dedicated to contain the "Big Data", I suppose there should be a way to organize the information consistently, in order to help also a sound development of the data ingestion activities... does it exist a model or a method to follow to organize such information in a proper way? Which is your experience?

1 REPLY 1

@Nicola Poffo

I don't think we have any model or a method to follow the data ingestion activities but here I will try to give few examples the way we are arranging the data in HDFS

hdaoop fs -ls /basedata/ --it contains company's RAW data, nothing but source data without any ETL process

hadoop fs -ls /stage/ --all the ETL tables are creating here before inserting the data into target tables.

hadoop fs -ls /target/ --tables for Analysis

How we are Organizing individual applications's data?

Example for One Schema Data.

hadoop fs -ls /target/

hadoop fs -ls /target/<application/data sourcename>/<partitions>/

hadoop fs -ls /target/<application2/data sourcename>/<partitions>/

hadoop fs -ls /target/<application3/data sourcename>/<partitions>/

I hope this may help you 🙂