Member since
02-10-2019
47
Posts
9
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2644 | 07-15-2019 12:04 PM | |
1768 | 11-03-2018 05:00 AM | |
3293 | 10-24-2018 07:38 AM | |
3583 | 10-08-2018 09:47 AM | |
959 | 08-17-2018 06:33 AM |
07-15-2019
12:04 PM
1 Kudo
@Javert Kirilov The config json of ats-hbase should be created on startup of ResourceManager. If it wasn't created, then check the logs of the ResourceManager for any errors related to ats-hbase while starting up.
... View more
11-06-2018
06:27 AM
@Sam Hjelmfelt I don't think setting YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE in yarn-env.sh, will set the same in the ContainerLaunchContext. Have you tried setting it in the service spec itself to see if it helps ?
... View more
11-03-2018
05:00 AM
1 Kudo
@Sam Hjelmfelt Running Docker containers which have ENTRYPOINT in YARN Services requires additional configuration to be specified in the service spec. The env variable YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE needs to be set to true. Additionally the launch command parameters are separated with commas instead of space. Try running with the below spec. {
"name": "myapp",
"version": "1.0.0",
"description": "myapp",
"components": [
{
"name": "myappcontainers",
"number_of_containers": 1,
"artifact": {
"id": "myapp:1.0-SNAPSHOT",
"type": "DOCKER"
},
"launch_command": "input1,input2",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
}
}
}
]
} For further reference, refer to the documentation here
... View more
10-29-2018
10:54 AM
@john x ResourceManager will only contain the information for containers which are currently running for an application. Some containers which are already finished will not be present in ResourceManager, but they will be present in App Timeline Server. So the container list command will try to fetch from from both ResourceManager and App Timeline Server. The error shows that the App Timeline Server does not contain the application. Maybe the app timeline server had issues or was not running when this application was submitted.
... View more
10-24-2018
11:06 AM
@Prashant Gupta Good to know that the ResourceManager started successfully. Kindly mark the answer as accepted if the problem got resolved.
... View more
10-24-2018
07:38 AM
@Prashant Gupta From your logs attached, it looks like you have enabled GPU Scheduling. But it is still using the DefaultResourceCalculator. 2018-10-22 17:48:02,490 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1495)) - Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: RM uses DefaultResourceCalculator which used only memory as resource-type but invalid resource-types specified {yarn.io/gpu=name: yarn.io/gpu, units: , type: COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 9223372036854775807, memory-mb=name: memory-mb, units: Mi, type: COUNTABLE, value: 0, minimum allocation: 1024, maximum allocation: 191488, vcores=name: vcores, units: , type: COUNTABLE, value: 0, minimum allocation: 1, maximum allocation: 32}. Use DomainantResourceCalculator instead to make effective use of these resource-types In YARN->Configs->Advanced->Scheduler , set the following yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
... View more
10-09-2018
09:36 AM
Yes. Actually a user can only run a single job at any moment. To run multiple jobs at a moment, they all need to be submit as different users.
... View more
10-09-2018
08:05 AM
@Soumitra Sulav The problem seems to be because the FileStatus returned by OzoneFileSystem does not have the owner field set and so its empty. As a result the ownership check fails. One workaround I see is to delete the /tmp/hadoop-yarn/staging/hdfs/.staging directory before submitting the Mapreduce job. Then this ownership check gets bypassed and the staging directory will be created again. But this means that you can't have more than one job using the /tmp/hadoop-yarn/staging/hdfs/.staging directory. So its not a good workaround, although the only available one from what I see (Apart from code change in Mapreduce/Ozone) .
... View more
10-08-2018
01:21 PM
Good to know you got it resolved. You can accept the answer if it helped. One more thing to note is that java debug doesn't work if more than one map container is launched in the same node. This is because both map container processes will try to listen on the debug port 8787 and might fail.
... View more
10-08-2018
09:47 AM
@Eddie Generally specifying the mapreduce.map.java.opts in quotes will work for all the example jobs. The following command running a pi job worked. yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi -Dmapreduce.map.java.opts="-XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCDateStamps -XX:SurvivorRatio=8" 1 1 I see that your command uses your specific class Myclass.class. The example pi job works because it parses the command line options using org.apache.hadoop.util.GenericOptionsParser Your MyClass.class should use org.apache.hadoop.util.GenericOptionsParser to parse the command line options for it to work properly.
... View more
09-18-2018
01:06 PM
@Prasan Shetty You shouldn't move the directories under hadoop/yarn/local while the NodeManager service is running as it will affect running application containers. Stop the NodeManager service on that node, perform the move operation and then start it again
... View more
09-18-2018
09:59 AM
1 Kudo
@Roberto
Ayuso
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb . When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver. When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
... View more
09-18-2018
06:44 AM
1 Kudo
@Amila
Silva
HDP 3.0 supports GPU isolation in docker using nvidia-docker-plugin https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin which is part of nvidia-docker v1. Currently only this is supported and not the newer version.
... View more
09-13-2018
12:02 PM
1 Kudo
@Michael Bronson I will look to create an article about configuring the vcores for cpu scheduling when I get time. I will mention this part there.
... View more
09-13-2018
10:48 AM
Interesting. Can you paste the lscpu output of the nodes you are mentioning?
... View more
09-13-2018
10:12 AM
1 Kudo
@Michael Bronson Yarn Vcores can ideally be set up to 2x the actual cpu present based on the use case. Thats why ambari provides the option in the scroll bar. It does not depend on the number of threads shown in lscpu. If you want to prevent over utilization of cpu by Yarn and leave cpu for OS and other processes you can set to 80% of 32 . But keep in mind that this value will only be considered by YARN for scheduling containers, when CPU scheduling is enabled.
... View more
09-13-2018
09:28 AM
1 Kudo
@Michael Bronson The lscpu output of "CPU(s):" itself always takes into consideration the "Thread(s) per core". So usually, CPU(s) = [Thread(s) per core] X [Core(s) per socket] X [Socket(s)] It is sufficient to consider only CPU(s), when setting yarn.nodemanager.resource.cpu-vcores .
... View more
09-07-2018
05:09 AM
@naveen r The amount of resource a Yarn application will request is completely dependent on the type of Application. For MapReduce, it is generally based on the inputsplit/no. of reducers configured and the memory/vcores configured per mapper/reducer in the JobConf. To check how much Resources are currently being used by a single application, you can use the following Rest API and check the fields "allocatedMB","allocatedVCores" and "runningContainers" GET http://<rm http address:port>/ws/v1/cluster/apps/<applicationId>
... View more
08-17-2018
06:33 AM
@Muthukumar S This is a normal log whenever the BlockManager starts up. You can even check it in the namenode logs. Invalid blocks such as overreplicated blocks if they exist will be deleted one hour after Namenode starts. No need to worry about any data loss here at all. Start your HDFS service as usual.
... View more
08-16-2018
04:51 PM
@Sivasankar Chandrasekar yarn.scheduler.maximum.allocation-mb is a scheduler level config and applies to the ResourceManager only. It should be set to a single value, ideally the largest possible container your applications may want to request. You can set it to 8GB if your applications will only use the maximum of 8GB for a single container. If you have requirement of launching a single container of size 32GB, you can also set it to 32GB. But only nodes which have 32GB memory can fulfill the container request. You should also create config groups in ambari for the property yarn.nodemanager.resource.memory-mb and set to different values 8 GB/16GB/32 GB with respect to the nodes.
... View more
07-30-2018
02:20 PM
You can use http://<rm http address:port>/ws/v1/cluster/metrics . This will retrieve the allocatedMB and allocatedVirtualCores at a given point in time. Refer http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
... View more
07-02-2018
05:47 AM
Can click on the application in RM UI and see what is reported in Diagnostics and paste the content ? It should specify the reason why the job is still in accepted state.
... View more
06-29-2018
03:01 PM
An easy way to check the maximum am resource is in RM UI for queue q4 http://rm-host:8088/cluster/scheduler?openQueues=Queue:%20q4 Check the values for Max Application Master Resources and Used Application Master Resources . Also you can check other values here which will be useful to identify your queue limits configured.
... View more
06-29-2018
02:43 PM
There are additional parameters which limit the usage of queue resources by a single user or the application master. They are yarn.scheduler.capacity.<queue-path>.user-limit-factor and yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent . The detailed documentation of these capacity-scheduler properties can be referred at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Queue_Properties Property Default Behaviour and Recommendation yarn.scheduler.capacity.<queue-path>.user-limit-factor 1 If you are submitting jobs as the same user, it is recommended to increase the value above 1. Otherwise the same user can't submit more than one job which exceeds the queue capacity. For q4, a single user can only utilize the max 100% if this is set to 10. This is likely the reason the new job is not getting executed. yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent 0.1 For example if you consider the default value, in q4 only 10% of the max-capacity can be used. When multiple applications are launched to the same queue, then new applications wont be accepted even if resources are free in the cluster
... View more
05-03-2018
10:24 AM
I didn't notice that you were only setting YARN_RESOURCEMANAGER_OPTS. This env variable is used for only the resourcemanger daemon. So to specify the opts for all hadoop and yarn client commands, you can use HADOOP_CLIENT_OPTS in . hadoop-env.sh . export HADOOP_CLIENT_OPTS="-Dyarn.resourcemanager.hostname=192.168.33.33" But I am not sure why you would need to this when you can just set it in the yarn-site.xml, which is what is recommended.
... View more
05-01-2018
06:33 AM
yarn-env.sh is used when you run any yarn command. So it works if you use the yarn command to submit a mapreduce job as below. yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 5 5 But spark-submit command doesn't invoke yarn-env.sh, so it will read the yarn-site.xml from $HADOOP_CONF_DIR and gets resourcemanager address from it.
... View more
04-20-2018
02:02 PM
The retain-seconds will not work for an active application that is writing files. It works by checking whether the last modified timestamp for the application log dir falls older than the retain-seconds. Since your streaming job writes logs continuously, the directory timestamp will never fall within 600 seconds. So your logs are not getting deleted because of this. Also log aggregation in yarn doesn't work the same way as setting log rolling/retention like in log4j as you are expecting.
... View more
04-20-2018
01:32 PM
1 Kudo
The yarn log aggregation retention can be controlled by setting yarn.log-aggregation.retain-seconds property in yarn-site.xml For example, if you want logs older than 30 days to be deleted, you can set yarn.log-aggregation.retain-seconds to 2592000
... View more
04-18-2018
07:17 AM
@Purna Chandra Mahesh Bhogavalli There is no sort or order_by query parameter supported in the ResourceManager rest api. For your curl request, you can check the documentation at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API The only option now is to sort at the client side. For command line, one easy way is to install a tool called jq https://stedolan.github.io/jq/download/ if you are interested. For your use case, you can do something like follows: curl -get "http://resource-manager-hostname:8088/ws/v1/cluster/apps?state=running&limit=20" | jq '.apps.app|sort_by(.queueUsagePercentage)' For descending sort you can use, curl -get "http://resource-manager-hostname:8088/ws/v1/cluster/apps?state=running&limit=20" | jq '.apps.app|sort_by(.queueUsagePercentage)|reverse' Hope this helps.
... View more
04-17-2018
09:02 AM
@Simran Kaur The log snippets don't indicate the problem you are facing. Can you post or attach any ERROR or failure messages in the log?
... View more