About mbigelow

mbigelow · ‎02-28-2017

What do you mean "it does not work well"? Does the second NN UI not work? Is there an error?

mbigelow · ‎02-28-2017

It is a Spark configuration setting. "Otherwise you will need to specify it manually in the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf or History Server Advanced Configuration Snippet (Safety Valve) for spark-history-server.conf."

mbigelow · ‎02-28-2017

Try setting the spark.yarn.historyServer.address manually as I mentioned previously.

mbigelow · ‎02-28-2017

Try hdfs fsck -blockId blk_1157585017_83846591 if available. It is on my CDH 5.8.2 cluster but not on my CDH 5.4.5 cluster. I am getting on error on my CDH 5.8.2 cluster so I don't know if it will have the output you are looking for. You could try scanning all of the DFS data directories for the file manually as well. It will only find it though if the block was created but which may not be the case if it says the job.split file doesn't exist any longer.

mbigelow · ‎02-28-2017

There shouldn't be any Spark components installed on the DN. Better stated, if there isn't a Spark role installed on a node it will not have the configs from CM. I have mine installed on the master and edge nodes and can see the configs there including the spark.yarn.historyServer.address setting. It will complain that it is missing a requirement when you select it and that will go away after you re-select it.

mbigelow · ‎02-27-2017

What value is spark.yarn.historyServer.address set to? I suspect that it did not get updated to the new Spark History server address so the RM UI is still linking to the old location? This setting is not in CM. I had to check the spark configs on a node. To try to fix it you could de-select YARN as the YARN (MR2) Service in CM by clicking on the blue arrow pointing to the left. Save the configs and then re-select YARN. Otherwise you will need to specify it manually in the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf or History Server Advanced Configuration Snippet (Safety Valve) for spark-history-server.conf.

mbigelow · ‎02-27-2017

Since it is temporary data generated for a job and will be removed when a job is done, pass or fail, it would be created with a repl of 1. I don't know for certain but I could see it with repl of 1 because of this. I don't know if this will help but the file itself is what the job creates to track the input splits of the job.

mbigelow · ‎02-27-2017

This looks disturbing to me. The staging directory is used by the job to create job specific data. So this file was written out after the job started but then the job is unable to read any of the blocks for the file. This could indicate a datanode that is likely to fail or maybe a few disks. It is likely that the file was created with a replication factor of 1. Have you ran fsck against the parent folder or all of HDFS with the -list-corruptfileblocks and -blocks. Any other blocks missing or corrupt?

mbigelow · ‎02-27-2017

I am not entirely sure what you are looking for with this request. I find the most useful information in the main logs for the process, the stdout and stderr logs for each running process, and the CM agent logs. If there is an issue that is causing a health test to fail it is like to be in one of these files, but that is not a guarentee. I don't know for certain but I believe the pass/fail of the tests are kept in the Host or Service monitor databases. Service logs: /var/log/<name of service> Running process logs: /var/run/cloudera-scm-agent/process/<pid-servicename>/logs CM Agent logs: /var/log/cloudera-scm-agent/ I also encourage the use of the CM api. Most of these logs can be accessed through it and probably health tests (pass/fail) themselvs, if not at least the alerts the raise. https://cloudera.github.io/cm_api/apidocs/v15/index.html

mbigelow · ‎02-26-2017

Take a look at the CM agent logs for the hosts. The distributions phase of the install is the CM agents downloading the parcel from the CM host. Whatever problem there is it should be reported there.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: NameNode URL not accesible on Kerberos Cluster

Re: Wrong spark history redirection for finished j...

Re: Wrong spark history redirection for finished j...

Re: Mapreduce failed on Could not deallocate conta...

Re: Wrong spark history redirection for finished j...

Re: Wrong spark history redirection for finished j...

Re: Mapreduce failed on Could not deallocate conta...

Re: Mapreduce failed on Could not deallocate conta...

Re: information about health tests

Re: Cloudera parcel installation is stuck in distr...