Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Do you already have a plan for controlling the distribution of blocks between datanodes in a rulebook?

Do you already have a plan for controlling the distribution of blocks between datanodes in a rulebook?

New Contributor

Hi @szetszwo,

I have a bit of experience and practice in HortonWorks installation, maintenance and Talend based,sqoop ingestion.
Nowadays I'm a pgsql Greenplum developer (core framework) at the same company where we are working on a Hadoop based approach also.
The real problem is as we are thinking because we want to use HDFS,Hive as a MPP RDBMS.

I've read your post about Balancer daemon (100x Performance Improvement) where you speak about a "Block Pinning" but I haven't spent much time with the further investigation.

Is there any plan ( I know it is a huge change) to improve blocking procedure (constraint satisfaction algorithm) with "distributed by" mechanism as available in GreenPlum (over existing partitioning)?

You know when we play with joins in spark notebooks the networking is very significant.
I read that the block splitting is handled in the storage file format writers.


Thank you for your answer in advance !

Best regards,
Laszlo