About Wilfred

Wilfred · ‎05-25-2016

Sidharth, Please create a new thread for a new issue, re-using an old thread could lead to strange comments when people make assumptions based on irrelevant information. For your issue: EPERM means that the OS is not allowing you to create the NM recovery DB and you have recovery turned on. Check the access to the recovery DB directory that you have configured. Wilfred

Wilfred · ‎05-24-2016

Thank you for the update. It might be better to start a new thread for new issues that are encountered so we do not get tripped up by old information and can take a fresh look at the issue you have. Wilfred

Wilfred · ‎05-18-2016

I am not sure if you posted to the old mailing list before but the numbers seem to similar to not be the same question. If you have a case like you describe the reducers can take over the cluster and cause the deadlock like Haibo said. We have fixed some issues in this behaviour in later CDH releases than what you are on. The number of containers that you can run at the same tim ein the cluster I estimate is somewhere in the 250 to 300 range at a maximum. The only way to prevent that the clusterwill be taken over by just reducers is to set slow start for the job to 1. It might slow down the job a bit but you should never see this kind of deadlock again. Try it and let us know how it went, Wilfred

Wilfred · ‎04-07-2016

We do not expose the vmem setting in Cloudera Manager since it is really troublesome to get that check correct. Depending on how the memory gets allocated the virtual memory overhead could be anywhere between 5% of the JVM size to multiple times the full JVM size. Your container is getting killed due to the physical memory (not virtual memory) over use. the best thing is to make sure that you allow for an overhead on top of the JVM size. We normally recommend 20% of the JVM heap size as the overhead. Again this is workload dependent and could differ for you. We are working on the change that you only need to set one of the two and fully support that in Cloudera Manager. Some of the changes have been made to the underlying MR code already via MAPREDUCE-5785... Wilfred

Wilfred · ‎04-07-2016

As Sumit said there are two settings: vmem, set to false, (virtual memory) and pmem, set to true, (physical memory). The blog is still correct and the change is for vmem and the way the virtual memory allocator works on Linux. For the pmem setting: that is the "real" memory and enforces the container restrictions. If you turn that off your task that runs in a container could just take all memory on the node. It leaves the NodeManager helpless to enforce the container sizing you have set and you expect the applications (and your end users) to behave in the proper way. Wilfred

Wilfred · ‎03-06-2016

You most likely have pulled in too many dependencies when you build your application. When you look at the gradle documentation for building it shows that it behaves differently than the maven. When you pack up an application gradle includes far more dependencies than maven. This could have pulled in dependencies which you don't want or need. Make sure that you only have in the application what you really need and that is not provided by hadoop. Search for gradle and dependency management. You need some way to define a "provided" scope in gradle. Wilfred

Wilfred · ‎03-01-2016

You can not just replace a file in HDFS and expect it to be picked up. The files will be localised during the run and there is a check to make sure that the files are were they should be. See the blog on how the sharelib works. The OOTB version of Spark that we deliver with CDH does not throw the error that you show. It runs with the provided http client so I doubt that replacing the jar is the proper solution. It most likely is due to a mismatch in one of the other jars that results in this error. Wilfred

Wilfred · ‎02-29-2016

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you. You can use multiple versions of sharelib as described here check for overriding the sharelib. Wilfred

Wilfred · ‎02-25-2016

If you need a specific version of guava you can not just add it to the classpath. If you do you totally rely on the randomness that is in the class loaders. There is no guarantee that you will get the proper version of guava loaded. First thing that you need to do is make sure that you get the proper version of guava loaded at all times. To do this the proper way is to shade (mvn) or shadow (gradle) your guava. Check the web on how to do this. It is really the only way to make sure you get the correct version and not break the rest of hadoop at the same time. After that is done you need to use the class path addition as discussed earlier and make sure that you add your shaded version. This i sthe only way to do this without being vulnerable for changes in the hadoop dependencies. Wilfred

Wilfred · ‎02-25-2016

The second version of Spark must be compiled against the CDH artifacts. You can not pull down a generic build from a repository and expect that it works (we know it has issues). You would thus need to compile your own version of Spark and use the correct version of CDH to do it against. Using Spark from a later or earlier CDH release will not work, most likely due to changes in dependant libraries (i.e. hadoop or hive version). For the shuffle service and the history service: they both are backwards compatible and only one of each is needed (running two is difficult and not needed). However you must run/configure only the one that comes with the latest version of Spark in your cluster. There is no formal support for this and client configs will need manual work... Wilfred

Online	Offline
Last Visited	‎05-14-2025 06:21 PM

Member Since	‎01-16-2014 10:22 PM
Last Visited	‎05-14-2025 06:21 PM
Posts	336
Kudos received	43

Cloudera Community

Re: Shall we run multiple spark version jobs innoo...

Re: CompositeGroupsMapping

Re: Yarn Fair Scheduler Allocation file not found ...

Re: Odd behavior when pending mappers get stuck on...

Re: Have various Spark version running on the clus...

Re: CDH5.2: yarn :Error starting yarn nodemanagers

Re: Odd behavior when pending mappers get stuck on...

Re: Odd behavior when pending mappers get stuck on...

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: Spark distributed classpath

Re: Spark distributed classpath

Re: Have various Spark version running on the clus...

Re: Spark distributed classpath

Re: Have various Spark version running on the clus...