About Shelton

Shelton · ‎01-02-2020

@shyamshaw I am already answering a similar question see this thread https://community.cloudera.com/t5/Support-Questions/Unable-to-start-Node-Manager/td-p/285976 Please go through the thread and update me with what isn't working, I will answer to both threads soon

Shelton · ‎01-02-2020

@Uppal Great if all went well we usually run msck repair table daily once you have loaded a new partition in HDFS location. Why you need to run msck Repair table statement every time after each ingestion? Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions. msck will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist If you will find the need remember to do that else accept the answer and close the thread :

Shelton · ‎01-02-2020

@Uppal Great that worked out better for you, did you run MSCK REPAIR TABLE table_name; on the target table? f you found this answer addressed your initial question, please take a moment to login and click "accept" on the answer. Happy hadooping

Shelton · ‎01-02-2020

@saivenkatg55 I see in the hadoop-yarn-nodemanager-w0lxdhdp05.ifc.org.log errors pointing to "Unable to start NodeManager: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6279667856305652637.8 (Permission denied)] My suspicion: Please verify that /tmp on the host does not have the noexec option set. You can verify this by running /bin/mount and checking the mount options. If you are able to, remount /tmp without noexec and try starting the NodeManager again. I am sure its issue with noexec on /tmp. See my sample output [root@tokyo ~]# /bin/mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,size=7167976k,nr_inodes=1791994,mode=755) ....... ... systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15609) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime) /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered) /dev/sda5 on /opt type ext4 (rw,relatime,data=ordered) /dev/sda8 on /home type ext4 (rw,relatime,data=ordered) /dev/sda11 on /u02 type ext4 (rw,relatime,data=ordered) /dev/sda6 on /var type ext4 (rw,relatime,data=ordered) /dev/sda10 on /u01 type ext4 (rw,relatime,data=ordered) /dev/sda9 on /tmp type ext4 (rw,relatime,data=ordered) This issue occurs when the user running the Hadoop [Nodemanager start] process does not have the necessary rights and cannot generate temporary files under the /tmp directory. Solution - Allow the user running node manager startup process read/write/execute access on /tmp - Remove the noexec parameter when mounting /tmp - Change the execution rights on /tmp. ie: sudo chmod 777 /tmp In the /var/log/messages I can also see Jan 2 05:14:23 w0lxdhdp05 abrt-server: Package 'ambari-agent' isn't signed with proper key Jan 2 05:14:23 w0lxdhdp05 abrt-server: 'post-create' on '/var/spool/abrt/Python-2020-01-02-05:14:22-11897' exited with 1 Jan 2 05:14:23 w0lxdhdp05 abrt-server: Deleting problem directory '/var/spool/abrt/Python-2020-01-02-05:14:22-11897' Please edit /etc/abrt/abrt-action-save-package-data.conf change the value for OpenGPGCheck should be changed from yes to no. OpenGPGCheck = no It might also be necessary to change the value of limit coredumpsize: limit coredumpsize unlimited After editing the file restart the process with the following command: # service abrtd restart Restart the node manager and share your joy !

Shelton · ‎01-01-2020

@pra_big hbase user is the admin user of hbase one connects to a running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. Here the version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character. As hbase user $ ./bin/hbase shell hbase(main):001:0> All the below methods will give you access to the HBase shell as the admin user [hbase] If you have root access # su - hbase It will give you the same above If you have sudo privileges # sudo su hbase -l I don't see the reason for changing to bash or didn't I understand your question well?

Shelton · ‎01-01-2020

@Uppal Any updates on this thread.

Shelton · ‎01-01-2020

@saivenkatg55 You didn't respond to this answer, do you still need help or it was resolved if so please do accept and close the thread.

Shelton · ‎01-01-2020

@ssk26 A queueMaxAMShareDefault and maxAMShare are mutually exclusive as its overridden by maxAMShare element in each queue. Can you decrease it to queueMaxAMShareDefault or maxAMShare to 0.1 and weight to 2.0 For the spark create the fairscheduler.xml from the fairscheduler.xml.template your path might be different due to version 3.1.x.x.x. # cp /usr/hdp/3.1.x.x-xx/etc/spark2/conf/fairscheduler.xml.template fairscheduler.xml Please check the file permission Then set spark.scheduler.allocation.file property in your SparkConf or either by putting a file named fairscheduler.xml on the classpath. Note if no pools configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0). Here there are 2 default pools in fairscheduler.xml.template notably production and test using FAIR and FIFO <allocations> <pool name="production"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> <pool name="test"> <schedulingMode>FIFO</schedulingMode> <weight>2</weight> <minShare>3</minShare> </pool> </allocations> Without any intervention, newly submitted jobs go into a default pool, but jobs’ pools can be set by adding the spark.scheduler.pool “local property” to the SparkContext in the thread that’s submitting them. This is done as follows: // Assuming sc is your SparkContext variable to pick the FAIR sc.setLocalProperty("spark.scheduler.pool", "production") Please let me know

Shelton · ‎01-01-2020

@alialghamdi Your issue is being generated by the python script /usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py see line 38 "host_level_params_cache = self.host_level_params_cache[cluster_id" Solution 1 on node 6 Delete the tmp files to empty the cache on node 6 after stopping the ambari-agent node6 # ambari-agent stop node6 # rm -rf /var/lib/ambari-agent/* Then restart the ambari-agent on host 6 node6 # ambari-agent start Solution 2 on node 6 node6 # ambari-agent stop yum erase ambari-agent rm -rf /var/lib/ambari-agent rm -rf /var/run/ambari-agent rm -rf /usr/lib/amrbari-agent rm -rf /etc/ambari-agent rm -rf /var/log/ambari-agent rm -rf /usr/lib/python2.6/site-packages/ambari* Re-install of Ambari Agent yum install ambari-agent # Change hostname to point to the Ambari Server vi /etc/ambari-agent/conf/ambari-agent.ini Start the ambari-agent agent # ambari-agent start Please revert

Shelton · ‎12-31-2019

@alialghamdi I have an idea, depending on your backend Ambari database please first do a backup. We are not going to do any changes yet but validate my suspicion DB backup Assuming you are on MySQL /MariaDB mysqldump –u[user name] –p[password] [database name] > [dump file] Check cluster state select * from clusterstate; The value found above should be there in Stage table's "cluster_id" columns select stage_id, request_id, cluster_id from stage; Identify troublesome host select host_id,host_name from hosts; Assuming you got host id 3 for the troublesome host select cluster_id,component_name from hostcomponentdesiredstate where host_id=3; select cluster_id,component_name from hostcomponentstate where host_id=3; select cluster_id,service_name from hostconfigmapping where host_id=3; Share your output for all the above steps, please tokenize your hostname.domain

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Switching from Capacity Scheduler to Fair Sche...

Re: Copying the Hive External table from one data...

Re: Copying the Hive External table from one data...

Re: Unable to start the node manager

Re: What the correct way to connect to HBase user ...

Re: Copying the Hive External table from one data...

Re: Unable to create the notebook in zeppelin .

Re: Unable to start Node Manager

Re: Heartbeat works, but when I start services I g...

Re: Heartbeat works, but when I start services I g...