About Wilfred

Wilfred · ‎08-18-2015

For any node that you want to use to submit Spark jobs from to the cluster you should make it a Spark gateway. That should push all the required configuration out to the node. There is no change to the submit scripts or installed code when you create a gateway. When you make it a Spark gateway all jars and config should be pushed out correctly. If it is not a Spark gateway the default config is on the node which as you noticed does not work without making some changes. In CM before 5.4 you also need to make sure that the correct YARN configuration is available on the node. CM 5.4 does that for you. Wilfred

Wilfred · ‎08-18-2015

HAR's are special and not really a file format. The implementation is part of the FileSystem (i.e. listing files in an archive is done via hdfs dfs -ls har:///har_file.har) Why are you want to create har files, using a sequence file sor some other container format to store the files might be much easier. I am not sure that the code in Spark will handle the har: URI. Wilfred

Wilfred · ‎08-18-2015

How many executors do you have when you run this? I see the same when I run it because it gets sent to each executor (2 in my case) Wilfred

Wilfred · ‎08-11-2015

There is something really strange with this job. The job should at least do something but the submitted job shows: "number of mappers: 0; number of reducers: 0" A count(*) should at least have 1 mapper (go over the input) and 1 reducer (to sum up). Does this happen for every hive job? Can you run a simple MapReduce example like pi on the cluster to make sure that the cluster works at that level? Wilfred

Wilfred · ‎08-11-2015

Check the memory and container settings. You rely on the defaults when you pass nothing in and this could have caused the containers to be killed. The yarn logs should tell you more, the application id is printed in the logs you added so retrieve them by running: yarn logs -applicationId application_1439215700542_0064 analyse them and check what happened Wilfred

Wilfred · ‎07-30-2015

What the FS takes into account depends on the scheduling type that you have chosen: DRF, Fair or FIFO. Default is DRF which takes into account both memory and CPU. An application that asks for more resources than the cluster can accommodate, i.e I request 100GB for a container and the maximum container can only be 64GB then it will be rejected. However if I ask for 32 GB and the maximum container is 64GB but there is no node that is large enough to handle the 32 GB then it will just sit there forever (YARN-56). If the maximum container size is 64GB but no node can accommodate that container it most likely will just sit there too. I am not sure what would happen if I request a 32GB container for a queue which has only 16GB as the maximum resources if it will be rejected or just sit there forever. I have not tested that case. So you might have a misconfiguration or just run into a bug. BTW: whatever was mentioned for memory is true for vcores also. Wilfred

Wilfred · ‎07-29-2015

Set it through the NodeManager yarn-site.xml configuration snippet. You used the ResourceManager snippet and the check is not performed on that service that is why it did not work for you. Wiflred

Wilfred · ‎07-29-2015

I would strongly recommend to not set that to false. It will prevent the NM to keep control over the containers. If you are running out of physical memory in a container make sure that the JVM heap size is small enough to fit in the container. The container size should be large enough to contain: - JVM heap - permanent generation for the JVM - any off-heap allocations In most cases an overhead of between 15%-30% of the JVM heap will suffice. Your job configuration should include the proper JVM and container settings. Some jobs will require more and some will require less overhead. If you really want to make the change: the pmem-check-enabled is a NM setting and you need to set it snippet for the NM. Wilfred

Wilfred · ‎07-29-2015

The null container log entry that was shown in the earlier message is a code issue which has been fixed in an upcoming release. We printed the worng reference for the container and it would always be a null. For the state changes:they are correct after we fail the application and we have not exhausted the AM retries it will be pushed back into the queue for scheduling which means the app goes into an ACCEPTED state. For YARN-3103 you will not see that issue if you are running CDH 5.4 it is part of that release, if you run an earlier version please upgrade. Wilfred

Wilfred · ‎07-27-2015

Cloudera Manager is not involved in YARN HA. You configure it through CM but after that the RMs handle the HA side using the zookeeper quorum. Logs will be the easiest and quickest to find an answer. Just a snipper around the failover should do it. Wilfred

Online	Offline
Last Visited	‎02-15-2023 08:41 PM

Member Since	‎01-16-2014 10:22 PM
Last Visited	‎02-15-2023 08:41 PM
Posts	336
Kudos received	43

Cloudera Community

Re: Shall we run multiple spark version jobs innoo...

Re: CompositeGroupsMapping

Re: Yarn Fair Scheduler Allocation file not found ...

Re: Odd behavior when pending mappers get stuck on...

Re: Have various Spark version running on the clus...

Re: spark-submit on additional machine

Re: Build a HAR (Hadoop Archive) using Spark

Re: How many times does the script used in spark p...

Re: map reduce 2.0 throwing error after enabling k...

Re: spark-shell 1.3 errors out immediately with re...

Re: JOB Stuck in Accepted State

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: What does state transition RUNNING --> ACCEPTE...

Re: CDH 5.4.1 - Yarn Hight Availability does not w...