Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1818 | 12-20-2017 08:26 PM | |
1831 | 03-09-2017 03:47 PM | |
1647 | 11-18-2016 09:00 AM | |
2361 | 05-18-2016 08:29 PM | |
2078 | 02-29-2016 01:14 AM |
01-08-2017
04:03 PM
You can have differences between the options for the NMs that is not the problem. It could be that the difference in HW used in the NMs requires a different JVM option to be set so it is something that we allow and will also work. However there can not be an empty line in the options. An empty line in the options is passed on to the settings in the script to set the environment etc. That is where it breaks. The empty line breaks the setting into two in the script which should not happen. The empty line(s) should be trimmed before we generate that settings script, which is the jira I filed. Wilfred
... View more
01-04-2017
07:07 PM
Good to hear that you have found the discrepancy between the nodes and have been able to fix it. I reproduced your issue internally on Cloudera Manager 5.9 I have logged an internal jira to fix CM and not allow you to properly handle adding the empty line in the options. Wilfred
... View more
12-27-2016
07:10 PM
1 Kudo
The application gets assigned 57 or 62 containers to start within 20 seconds or so. At that point the log just stops. The AM should be doing the main amount of work in this case at the point you are at. There is no scheduling problem on the RM side. If there would be a scheduling problem there you would not get any containers. Since you have containers assigned the AM should take over and start the containers on the NMs. There is no trace of that at all. This could be due to a number of things. Some points to check would be the connectivity between the AM and the NMs. One other thing to try is turn on debug logging on the AM and see if we show anything else on the AM side. For now I would not suspect the RM at all, and not blame the scheduler in the RM. Wilfred
... View more
12-22-2016
05:31 PM
1 Kudo
Can you provide a bit more background for this issue? Which CDH version are you using? Do you have CM in the cluster or not? You will also need to explain far more about the Resource Manager configuration (scheduler and its config) and the node managers configs. Lots of resources is something that is open for many interpretations... The last thing is your application requests: what kind of containers do you request for the different jobs you mentioned. Thank you, Wilfred
... View more
12-22-2016
05:13 PM
Hi, Could you please open a new thread next time for a new issue? It might at first look related but there is a good chance that it is something completely different and the old issue might distract from finding the solution quickly. If you have some error mesages or more detail than please provide it so we can help you. Wilfred
... View more
12-22-2016
04:56 PM
The issue you have is due to the empty line in the YARN_NODEMANAGER_OPTS variable. That empty line should not be there and is causing the split to fail. Can you check what is in the " Java Configuration Options for NodeManager" in Cloudera Manager and make sure that there is no empty line in that config? Thanks, Wilfred
... View more
- Tags:
- issue is th eme
12-22-2016
04:36 PM
If you are not using Cloudera Manager you need to make sure that the configuration file is deployed to all nodes and in the proper place. CM handles this automatically for you (recent releases) but on older releases you might need to deploy the client config before starting the NM to make sure. The original message was posted 2 years ago so most likely that was in an old release of Cloudera Manager and the solution would have been to deploy client config and restart the NM. NOTE: The config file must be located relative to the binary container-executor. If it is not found then you will get a message like that. Manual deployments are not simple for this file. Wilfred
... View more
12-01-2016
06:24 AM
No this is not a known issue as far as I know. If you have a support contract please open a support case and we can look into it further for you. Wilfred
... View more
12-01-2016
05:56 AM
Those directories should be cleaned up in current releases via either SPARK-7503 and or SPARK-7705. They are specific for yarn based setups. It should still happen automatically. Wilfred
... View more
11-30-2016
09:56 PM
Umesh, Those settings are for Spark standalone clusters only. I would strongly advise you not to run standalone but use Spark on Yarn. When you use Yarn the problem does not exist as Yarn handles it for you. Wilfred
... View more
11-22-2016
11:03 PM
Since you are using the CDH distribution I hope you are using Cloudera Manager, that makes it much easier to manage and most things should have been configured for you. You can access applications via the RM web UI there it should have links though to the container logs also. For Spark applications you should also have the Spark History Server up and running. You should be able to find the application there and links through to the container logs. Logs are kept for days even for failed and finished applications. If you have log aggregation turned on you can use the yarn command to load the logs from HDFS. Otherwise the logs are kept on the local drives of the NodeManagers and you can check there. Wilfred
... View more
11-22-2016
06:50 PM
Look at the container log for the specific container " container_1476862885544_0131_01_000002" mentioned in the log. You are looking at NM logs, that will only tell you the general life cycle of the container. You need to look at what happens inside the container. Use the RM web UI and fins the application: application_1476862885544_0131 and drill into the containers that are run for the application. Wilfred
... View more
11-18-2016
10:14 AM
For your second issue. Depending on whether you have turned on log aggregation yarn.log-aggregation-enable or not you can increase the time to retain the logs to as long as you want by setting the appropriate retain setting: yarn.nodemanager.log.retain-seconds yarn.log-aggregation.retain-seconds For the first issue: if you really want to get the logs in multiple places then you should be able to do whatever log4j supports by changing the config. You can setup a container specific log configuration via the container-log4j.properties file. Passing that changed file in to the different applications is application dependent (see MAPREDUCE-6052 for example. Wilfred
... View more
11-18-2016
09:44 AM
If you are starting with a cluster now I would strongly recommend that you use a CDH release much later than CDH 5.3. The later releases (CDH 5.8 or CDH 5.9) are far more stable than what you are trying to use now. Even if you stick with CDH 5.3 at least use the latest maintenance release. Back to your question. There could be multiple things that cause your job to not start. First point to check would be the RM web UI and see what state the cluster and the scheduler is in. After that it depends on what the RM UI shows you... Wilfred
... View more
11-18-2016
09:34 AM
There is a number of blog posts on our site that should help and answer all your questions. This one here should cover the questions on weights and things like it, but please read all of them. Wilfred
... View more
11-18-2016
09:31 AM
1 Kudo
That is a known issue: YARN-4022 The queue is removed just not from the display. Wilfred
... View more
11-18-2016
09:28 AM
Your issue is here: running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB. Give the container more space to run. You have run out of container space. Since this is a Spark job and you are using pyspark the easiest solution would be to increase the overhead ( spark.yarn.executor.memoryOverhead) that is used for this job in calculating the container size. Wilfred
... View more
11-18-2016
09:00 AM
1 Kudo
In CM you can retrieve the file that is pushed out from the RM instance information page. Go to the Yarn service -> Instances (in the bar on top) -> Select a RM instance from the list on the right -> Process (in the top bar again). There should be a list of all the configuration and environment files there that you can download. If they do not show up click the Show link under Configuration Files/Environment. Hope that helps, Wilfred
... View more
09-18-2016
05:53 PM
1 Kudo
Hi, You do not use either of these directly. The minimum settings are enforced by the scheduler: you can not request a container smaller than the minimum and the scheduler will round up anything smaller up to the minimum and give you the minimum container size. The increment is used for rounding up the container sizes. It makes the internal house keeping for the scheduler simpler. Neither setting influences the job settings directly, they are applied on top of the settings. You still need to set the resources that are needed for a job as part of the job configuration. As examples: 1) request a container 600MB/1vcore: minimum size of a container is 1GB -> a 1GB/1vcore container is allocated 2) request a container 1200MB/1vcore: minimum size is 1GB, increment is 500MB -> a container of 1.5GB (rounded up to the next increment, the minimum is used as a base) Same would happen for vcores. For more info see Untangling Yarn blog series Wilfred
... View more
09-18-2016
05:29 PM
No YARN-2246 is not in any release yet. I'll pull the patch into our release and we'll have it in an upcoming release. Wilfred
... View more
06-13-2016
04:42 AM
For the application ownload the application logs and check what the error is in the logs: yarn application -applicationId APPID -appOwner USERID Check the exit codes of the application and you should be able to tell in a bit more detail what is going on. Wilfred
... View more
05-25-2016
12:34 AM
Sidharth, Please create a new thread for a new issue, re-using an old thread could lead to strange comments when people make assumptions based on irrelevant information. For your issue: EPERM means that the OS is not allowing you to create the NM recovery DB and you have recovery turned on. Check the access to the recovery DB directory that you have configured. Wilfred
... View more
05-24-2016
07:09 PM
Thank you for the update. It might be better to start a new thread for new issues that are encountered so we do not get tripped up by old information and can take a fresh look at the issue you have. Wilfred
... View more
05-18-2016
10:59 PM
There is an open jira in Apache Oozie for that documentation issue: OOZIE-1283 So you can ignore that one and use the action extension as documented in the link given. Wilfred
... View more
05-18-2016
09:05 PM
You are referencing a real old jira which was resolved, and the action is still supported. In the oozie documentation we keep I am not able to find the deprecation message. SSH actions has limitations in its usage, that is why we do not recommend it. Wilfred
... View more
05-18-2016
08:29 PM
I am not sure if you posted to the old mailing list before but the numbers seem to similar to not be the same question. If you have a case like you describe the reducers can take over the cluster and cause the deadlock like Haibo said. We have fixed some issues in this behaviour in later CDH releases than what you are on. The number of containers that you can run at the same tim ein the cluster I estimate is somewhere in the 250 to 300 range at a maximum. The only way to prevent that the clusterwill be taken over by just reducers is to set slow start for the job to 1. It might slow down the job a bit but you should never see this kind of deadlock again. Try it and let us know how it went, Wilfred
... View more
04-13-2016
12:14 AM
one container is one reducer or one mapper never more than one. There is no way to "limit" things inside a container since it is a one on one relationship. Wilfred
... View more
04-13-2016
12:07 AM
It is here: http://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/
... View more
04-07-2016
01:50 PM
We do not expose the vmem setting in Cloudera Manager since it is really troublesome to get that check correct. Depending on how the memory gets allocated the virtual memory overhead could be anywhere between 5% of the JVM size to multiple times the full JVM size. Your container is getting killed due to the physical memory (not virtual memory) over use. the best thing is to make sure that you allow for an overhead on top of the JVM size. We normally recommend 20% of the JVM heap size as the overhead. Again this is workload dependent and could differ for you. We are working on the change that you only need to set one of the two and fully support that in Cloudera Manager. Some of the changes have been made to the underlying MR code already via MAPREDUCE-5785... Wilfred
... View more
04-07-2016
08:52 AM
1 Kudo
As Sumit said there are two settings: vmem, set to false, (virtual memory) and pmem, set to true, (physical memory). The blog is still correct and the change is for vmem and the way the virtual memory allocator works on Linux. For the pmem setting: that is the "real" memory and enforces the container restrictions. If you turn that off your task that runs in a container could just take all memory on the node. It leaves the NodeManager helpless to enforce the container sizing you have set and you expect the applications (and your end users) to behave in the proper way. Wilfred
... View more