Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3402 | 12-20-2017 08:26 PM | |
3377 | 03-09-2017 03:47 PM | |
2843 | 11-18-2016 09:00 AM | |
5027 | 05-18-2016 08:29 PM | |
3858 | 02-29-2016 01:14 AM |
08-18-2015
09:56 PM
For any node that you want to use to submit Spark jobs from to the cluster you should make it a Spark gateway. That should push all the required configuration out to the node. There is no change to the submit scripts or installed code when you create a gateway. When you make it a Spark gateway all jars and config should be pushed out correctly. If it is not a Spark gateway the default config is on the node which as you noticed does not work without making some changes. In CM before 5.4 you also need to make sure that the correct YARN configuration is available on the node. CM 5.4 does that for you. Wilfred
... View more
08-18-2015
08:28 PM
HAR's are special and not really a file format. The implementation is part of the FileSystem (i.e. listing files in an archive is done via hdfs dfs -ls har:///har_file.har) Why are you want to create har files, using a sequence file sor some other container format to store the files might be much easier. I am not sure that the code in Spark will handle the har: URI. Wilfred
... View more
08-18-2015
07:28 PM
How many executors do you have when you run this? I see the same when I run it because it gets sent to each executor (2 in my case) Wilfred
... View more
08-11-2015
11:42 PM
There is something really strange with this job. The job should at least do something but the submitted job shows: "number of mappers: 0; number of reducers: 0" A count(*) should at least have 1 mapper (go over the input) and 1 reducer (to sum up). Does this happen for every hive job? Can you run a simple MapReduce example like pi on the cluster to make sure that the cluster works at that level? Wilfred
... View more
08-11-2015
11:24 PM
1 Kudo
Check the memory and container settings. You rely on the defaults when you pass nothing in and this could have caused the containers to be killed. The yarn logs should tell you more, the application id is printed in the logs you added so retrieve them by running: yarn logs -applicationId application_1439215700542_0064 analyse them and check what happened Wilfred
... View more
07-30-2015
12:11 AM
1 Kudo
What the FS takes into account depends on the scheduling type that you have chosen: DRF, Fair or FIFO. Default is DRF which takes into account both memory and CPU. An application that asks for more resources than the cluster can accommodate, i.e I request 100GB for a container and the maximum container can only be 64GB then it will be rejected. However if I ask for 32 GB and the maximum container is 64GB but there is no node that is large enough to handle the 32 GB then it will just sit there forever (YARN-56). If the maximum container size is 64GB but no node can accommodate that container it most likely will just sit there too. I am not sure what would happen if I request a 32GB container for a queue which has only 16GB as the maximum resources if it will be rejected or just sit there forever. I have not tested that case. So you might have a misconfiguration or just run into a bug. BTW: whatever was mentioned for memory is true for vcores also. Wilfred
... View more
07-29-2015
08:46 PM
Set it through the NodeManager yarn-site.xml configuration snippet. You used the ResourceManager snippet and the check is not performed on that service that is why it did not work for you. Wiflred
... View more
07-29-2015
08:32 PM
1 Kudo
I would strongly recommend to not set that to false. It will prevent the NM to keep control over the containers. If you are running out of physical memory in a container make sure that the JVM heap size is small enough to fit in the container. The container size should be large enough to contain: - JVM heap - permanent generation for the JVM - any off-heap allocations In most cases an overhead of between 15%-30% of the JVM heap will suffice. Your job configuration should include the proper JVM and container settings. Some jobs will require more and some will require less overhead. If you really want to make the change: the pmem-check-enabled is a NM setting and you need to set it snippet for the NM. Wilfred
... View more
07-29-2015
07:49 PM
1 Kudo
The null container log entry that was shown in the earlier message is a code issue which has been fixed in an upcoming release. We printed the worng reference for the container and it would always be a null. For the state changes:they are correct after we fail the application and we have not exhausted the AM retries it will be pushed back into the queue for scheduling which means the app goes into an ACCEPTED state. For YARN-3103 you will not see that issue if you are running CDH 5.4 it is part of that release, if you run an earlier version please upgrade. Wilfred
... View more
07-27-2015
01:47 AM
Cloudera Manager is not involved in YARN HA. You configure it through CM but after that the RMs handle the HA side using the zookeeper quorum. Logs will be the easiest and quickest to find an answer. Just a snipper around the failover should do it. Wilfred
... View more