Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

YARN Memory Filling up while creating Dynamic Partition in Hive

Highlighted

YARN Memory Filling up while creating Dynamic Partition in Hive

Hi Team,

I am facing error "could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation." ( Error Log attached for broader reference) while Inserting data from external_hive_table to partitioned_hive_table (Which I have created with ORC format). So here basically I am trying to create dynamic partitions for about 114 Millions of records i.e. loading 10GB of data from external_hive_table to partitioned_hive_table, but I am see in Ambari that YARN memory is heated upto 100% and unable to allocate required number of containers ( Screenshot is attached for reference)

I am ruining below HIVE Query

INSERT OVERWRITE TABLE partitioned_hive_table (column_N)

SELECT columns 1, 2, 3, column_N FROM external_hive_table;

Infrastructure wise we have 6 nodes( 2 Masters, 3 Slaves, 1 Client ) HDP2.5 clusters with 27.43 GB of RAM on each host and 8 Core/host. HOST info is attached for reference.

Below are the YARN and MapReduce Configurations as per following https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/determi...

yarn.nodemanager.resource.memory-mb = 20GB

yarn.scheduler.minimum-allocation-mb = 4GB

yarn.scheduler.maximum-allocation-mb = 20GB

mapreduce.map.memory.mb = 4GB

mapreduce.reduce.memory.mb = 8GB

mapreduce.map.java.opts = -Xmx3276m

mapreduce.reduce.java.opts = -Xmx6553m

yarn.app.mapreduce.am.resource.mb = 8GB

yarn.app.mapreduce.am.command-opts = -Xmx6553m -Dhdp.version=${hdp.version}

HIVE.EXECUTION.ENGINE=MapReduce

Please have a look onto the attachments and help me to get the solution.


yarn-memory-utilization.pnghost-details.pngerror-log.pngyarn-containers-info.png
7 REPLIES 7
Highlighted

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Super Guru

can you run a dfsadmin -report and provide output

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Hi Sunile,

Thanks for your reply, Attached is the dfsadmin -reportdfsadmin-report.txt

Highlighted

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Expert Contributor

Your "Yarn Memory" is 100% used in here . The default resource allocation is based on memory and if all of the memory is utilized , it will be unable to allocate any further containers for the job. Have we cross verified if the resources are available, does the job run fine ?

Highlighted

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Thanks for replying..Well Yes, that's the problem, why YARN is getting full and unable to allocate further containers. We have sufficient set of memory available on each nodes, its 27.43 Gigs per node. Job ran fine half way only and came out with mentioned error if could not allocate further containers.

Highlighted

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Expert Contributor

dfsadmin report looks fine ( although its not complete ) . Please check how many containers have the job launched and does this job end up utilizing all the resources on the cluster?

Highlighted

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Oh....Not sure why Its not full, Although I had redirected dfsadmin report to fsadmin-report.txt. The job launched 36 containers all together but it could only allocate to 14-16 containers in different set of job runs. Resource Utilization wise, What I understand, Yarn must be handling this and scheduling as per the set configuration, which seems not to be doing. Kindly suggest.

Highlighted

Re: YARN Memory Filling up while creating Dynamic Partition in Hive

Expert Contributor

Is that getting submitted to a queue which has the queue limit set? Please cross verify which queue is the job getting submitted to and how much is the resource allocation on the queue. When you say that the job is launched with 36 containers , but can only run 14 -16 containers at a time indicates that its hitting the max threshold limit of the queue. Look into the RM UI --> Scheduler and the look into the queue allocation there and the limits set . Something like below:

10857-screen-shot-2016-12-28-at-125720-am.png

Don't have an account?
Coming from Hortonworks? Activate your account here