Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Is it possible to load data to just 2 data nodes instead of distributing across data nodes

New Contributor

Is it possible to load data to just 2 data nodes instead of distributing across data nodes. Thanks

2 REPLIES 2

@Dhilraj chemben

There is no method to achieve what you are asking here - the Name Node will replicate blocks to data nodes based upon rack configuration, replication factor and node availability, so even if you do managed to get a block on two particular data nodes, if one of those nodes goes down, the name node will replicate the block to another node.

Ideally, you cannot place the block on specific Data Node. You can set the replication to 1 to store the block on one Data Node alone.

hadoop fs -setrep 1 'file_name'

New Contributor

Thanks @Sindhu. My goal is to test the query performance QA (4 data nodes) vs PROD (8 data nodes). let's say i have a SELECT HQL which gets result in say 90 secs. in QA (~1 billion records) . Can we assume that in PROD , the same HIVE table with same number of records going to fetch ~40-50% better than QA?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.