Member since
04-13-2016
422
Posts
150
Kudos Received
55
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1861 | 05-23-2018 05:29 AM | |
4870 | 05-08-2018 03:06 AM | |
1627 | 02-09-2018 02:22 AM | |
2636 | 01-24-2018 08:37 PM | |
6055 | 01-24-2018 05:43 PM |
09-23-2017
01:14 AM
@Sree Kupp If the NameNode host has hardware problems and you need to move the NameNode to another host, you can do so as follows:
If the host to which you want to move the NameNode is not in the cluster, follow the instructions in Adding a Host to the Cluster to add the host. Stop all cluster services. Make a backup of the dfs.name.dir directories on the existing NameNode host. Make sure you back up the fsimage and edits files. They should be the same across all of the directories specified by the dfs.name.dir property. Copy the files you backed up from dfs.name.dir directories on the old NameNode host to the host where you want to run the NameNode. Go to the HDFS service. Click the Instances tab. Select the checkbox next to the NameNode role instance and then click the Delete button. Click Delete again to confirm. In the Review configuration changes page that appears, click Skip. Click Add to add a NameNode role instance. Select the host where you want to run the NameNode and then click Continue. Specify the location of the dfs.name.dir directories where you copied the data on the new host, and then click Accept Changes. Start cluster services. After the HDFS service has started, Cloudera Manager distributes the new configuration files to the DataNodes, which will be configured with the IP address of the new NameNode host. Go to the HDFS service. The NameNode, Secondary NameNode, and DataNode roles should each show a process state of Started, and the HDFS service should show a status of Good. You can't disturb between multiple node unless you want to store HDFS metadata in multiple location which will be same.
... View more
09-22-2017
06:13 PM
1 Kudo
@Hoang Le Capacity Scheduler’s leaf queues can also use the user-limit-factor property to control user resource allocations. This property denotes the fraction of queue capacity that any single user can consume up to a maximum value, regardless of whether or not there are idle resources in the cluster. Property: yarn.scheduler.capacity.root.support.user-limit-factor Value: 1 The default value of "1" means that any single user in the queue can at maximum only occupy the queue’s configured capacity. This prevents users in a single queue from monopolizing resources across all queues in a cluster. Setting the value to "2" would restrict the queue's users to twice the queue’s configured capacity. Setting it to a value of 0.5 would restrict any user from using resources beyond half of the queue capacity. These settings can also be dynamically changed at run-time using yarn rmadmin - refreshQueues . Please change user limit factor and try.
... View more
09-20-2017
08:01 PM
@Piyali Gupta
Here are the steps to increase HDFS Balancer network bandwidth for faster balancing of data between nodes Article hdfs dfsadmin -setBalancerBandwidth 100000000 on all the DN and the client we ran the command below hdfs balancer -Dfs.defaultFS=hdfs://<NN_HOSTNAME>:8020 -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 5 This will faster balance your HDFS data between datanodes and do this when the cluster is not heavily used. Couple of links to article : https://community.hortonworks.com/articles/51935/how-to-increase-hdfs-balancer-network-bandwidth-fo.html https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html Hope this helps you.
... View more
09-20-2017
02:09 PM
@raouia That is wrongly printed parameter. It should be 'dfs.datanode.data.dir' instead of 'dfs.data.dir'. This has been rectified in higher version of documentation. dfs.datanode.data.dir determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. The number of comma-delimited list equals to number of disks. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-installation/content/determine-hdp-memory-config.html Hope this helps you.
... View more
09-19-2017
04:31 AM
The reason why Ambari is unable to start Namenode smoothly is
bug and below is the workaround. Issue got fixed permanently in Ambari 2.5.x. Few lines of Error message from Ambari Ops logs: File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py",
line 55, in wrapper return function(*args,
**kwargs) File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
line 562, in is_this_namenode_active raise Fail(format("The
NameNode {namenode_id} is not listed as Active or Standby, waiting..."))resource_management.core.exceptions.Fail: The
NameNode nn2 is not listed as Active or Standby, waiting... ROOT CAUSE: https://issues.apache.org/jira/browse/AMBARI-18786
RESOLUTION: Increase the timeout in
/var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py from
this; @retry(times=5, sleep_time=5, backoff_factor=2,err_class=Fail)
to this;
@retry(times=25, sleep_time=25, backoff_factor=2,err_class=Fail)
... View more
Labels:
09-17-2017
05:10 AM
@Facundo Bianco Refers to Backup the Metrics Collector Data(HBase).
... View more
08-22-2017
07:02 PM
@suresh krish
When you see the environmental variables in your spark UI you can see that particular job will be using below property serialization. If you can't see in cluster configuration, that mean user is invoking at the runtime of the job. <code>spark.serializer org.apache.spark.serializer.KryoSerializer
Secondly spark.kryoserializer.buffer.max is built inside that with default value 64m. If required you can increase that value at the runtime. Even we can all the KryoSerialization values at the cluster level but that's not good practice without knowing proper use case. Hope this helps you.
... View more
08-17-2017
03:10 PM
@Deepak Nayak We can submit the spark jobs on remote cluster using the livy server using Rest calls. Below are the couple of links for examples: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-livy-rest-interface https://github.com/cloudera/livy Hope this helps you.
... View more
08-15-2017
01:50 AM
1 Kudo
@arjun more Are you able to sync groups similar like users? If not, please check below parameter with your LDAP team and add them as per their request: authentication.ldap.groupObjectClass [LDAP Object Class] The object class that is used for groups. Example: groupOfUniqueNames authentication.ldap.groupMembershipAttr [LDAP attribute] The attribute for group membership. Example: uniqueMember authentication.ldap.groupNamingAttr [LDAP attribute] The attribute for group name. Checking the ambari-server logs would help you in getting the error message. Hope this helps you.
... View more
08-10-2017
09:02 PM
@Dhiraj Yes, delegate admin in Ranger can all "selecting and changing user/group and permissions" for all this policies. Delegate admin will have full permissions as admin but with respective to that policy only.
... View more